使用重复模式从长列表中查找并提取字符串

时间:2022-09-13 11:29:07

I have two lists of text which I would like to extract certain information.

我有两个文本列表,我想提取某些信息。

The first line (first few terms) looks like

第一行(前几个术语)看起来像

line = "{"af":"16.63","al":"11.58",..."

I would like to extract only the letters between "" into a list if possible. e.g. ["af","al"...].

如果可能的话,我想只将“”之间的字母提取到列表中。例如[ “AF”, “人” ...]。

The second line is very long and contains a sequence which looks like

第二行很长,包含一个看起来像的序列

line = "...,"name":"Papua New Guinea"},..."

I just want the string after "name":"<country>" to be in another list if possible. e.g. [...,"Papua New Guinea",...]. The same pattern appears again and and again "name":"<country>"}, I would just like the countries.

我只想在“name”之后输入字符串:“ ”如果可能的话,在另一个列表中。例如[...,“巴布亚新几内亚”,...]。同样的模式又出现了“名字”:“ ”},我想要这些国家。

These both could be piped to two lists in different files using SED perhaps. I just need to get rid of all of the surrounding "fluff".

这些都可以使用SED通过管道传输到不同文件中的两个列表。我只需要摆脱所有周围的“绒毛”。

I've tried a combination of regex but it doesn't work. I can't get the syntax correct. Thanks in advance.

我尝试过正则表达式的组合,但它不起作用。我无法正确理解语法。提前致谢。

1 个解决方案

#1


1  

You are looking at JSON data; use the json module to parse this into Python structures. The rest of your tasks are then easy:

您正在查看JSON数据;使用json模块将其解析为Python结构。其余的任务很简单:

first_structure = json.loads(line)
print first_structure.keys()

second_structure = json.loads(countries_text)
print [d['name'] for d in second_structure]

#1


1  

You are looking at JSON data; use the json module to parse this into Python structures. The rest of your tasks are then easy:

您正在查看JSON数据;使用json模块将其解析为Python结构。其余的任务很简单:

first_structure = json.loads(line)
print first_structure.keys()

second_structure = json.loads(countries_text)
print [d['name'] for d in second_structure]