将逗号分隔的字符串转换为列表但忽略引号中的逗号

时间:2022-09-23 00:09:56

How do I convert "1,,2'3,4'" into a list? Commas separate the individual items, unless they are within quotes. In that case, the comma is to be included in the item.

如何将“1,,2'3,4'”转换为列表?逗号分隔各个项目,除非它们在引号内。在这种情况下,逗号将包含在项目中。

This is the desired result: ['1', '', '2', '3,4']. One regex I found on another thread to ignore the quotes is as follows:

这是期望的结果:['1','','2','3,4']。我在另一个线程上发现忽略引号的一个正则表达式如下:

re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')

But this gives me this output:

但这给了我这个输出:

['', '1', ',,', "2'3,4'", '']

I can't understand, where these extra empty strings are coming from, and why the two commas are even being printed at all, let alone together.

我无法理解,这些额外的空字符串来自哪里,为什么这两个逗号甚至都被打印出来,更不用说在一起了。

I tried making this regex myself:

我自己尝试制作这个正则表达式:

re.compile(r'''(, | "[^"]*" | '[^']*')''')

which ended up not detecting anything, and just returned my original list.

最终没有检测到任何东西,只是返回原来的清单。

I don't understand why, shouldn't it detect the commas at the very least? The same problem occurs if I add a ? after the comma.

我不明白为什么,它至少不应该检测到逗号?如果我添加一个?逗号之后。

2 个解决方案

#1


10  

Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:

您可能最好使用csv模块而不是正则表达式,因为您正在处理的是CSV字符串:

from cStringIO import StringIO
from csv import reader

file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
    print row

This results in the following output:

这导致以下输出:

['1', '', '2', '3,4']

#2


7  

pyparsing includes a predefined expression for comma-separated lists:

pyparsing包含逗号分隔列表的预定义表达式:

>>> from pyparsing import commaSeparatedList
>>> s = "1,,2'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', "2'3", "4'"]

Hmm, looks like you have a typo in your data, missing a comma after the 2:

嗯,看起来你的数据中有一个拼写错误,在2之后缺少一个逗号:

>>> s = "1,,2,'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', '2', "'3,4'"]

#1


10  

Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:

您可能最好使用csv模块而不是正则表达式,因为您正在处理的是CSV字符串:

from cStringIO import StringIO
from csv import reader

file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
    print row

This results in the following output:

这导致以下输出:

['1', '', '2', '3,4']

#2


7  

pyparsing includes a predefined expression for comma-separated lists:

pyparsing包含逗号分隔列表的预定义表达式:

>>> from pyparsing import commaSeparatedList
>>> s = "1,,2'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', "2'3", "4'"]

Hmm, looks like you have a typo in your data, missing a comma after the 2:

嗯,看起来你的数据中有一个拼写错误,在2之后缺少一个逗号:

>>> s = "1,,2,'3,4'"
>>> print commaSeparatedList.parseString(s).asList()
['1', '', '2', "'3,4'"]