如何在不同的分隔符上分割一个字符串,但是要在输出中保留一些所述的分隔符?(标记字符串)

时间:2023-01-11 21:47:54

More specifically I want to split a string on any non alpha-numeric character but in the case that the delimiter is not a white space I want to keept it. That is, to the input:

更具体地说,我想在任何非字母数字字符上分割一个字符串,但如果分隔符不是我想要保留的空白。即输入:

my_string = 'Hey, I\'m 9/11 7-11'

I want to get:

我想要:

['Hey' , ',' , 'I' , "'" , 'm', '9' , '/' , '11', '7' , '-' , '11']

Without no whitespace as a list element.

没有空格作为列表元素。

I have tried the following:

我试过以下方法:

re.split('([/\'\-_,.;])|\s', my_string)

But outputs:

但输出:

['Hey', ',', '', None, 'I', "'", 'm', None, '9', '/', '11', None, '7', '-', '11']

How do I solve this without 'unnecessary' iterations?

如何在没有“不必要”迭代的情况下解决这个问题?

Also I have some trouble with escaping the backslash character, since '\\\\' does not seem to be working, any ideas on how to also solve this?

另外,由于“\\\\ \\\\ \\\\ \\\”似乎不太好用,我在转义反斜杠字符时遇到了一些麻烦,有什么办法可以解决这个问题吗?

Thanks a lot.

非常感谢。

1 个解决方案

#1


3  

You may use

你可以用

import re
my_string = "Hey, I'm 9/11 7-11"
print(re.findall(r'\w+|[^\w\s]', my_string))
# => ['Hey', ',', 'I', "'", 'm', '9', '/', '11', '7', '-', '11']

See the Python demo

看到Python演示

The \w+|[^\w\s] regex matches either 1+ word chars (letters, digits, _ symbols) or a single character other than a word and whitespace char.

\ w + |[^ \ w \ s]正则表达式匹配要么1 +单词字符(字母、数字、_符号)或一个字符以外的一个字,空格字符。

BTW, to match a backslash with a regex, you need to use \\ in a raw string literal (r'\\') or 4 backslashes in a regular one ('\\\\'). It is recommended to use raw string literals to define a regex pattern in Python.

顺便说一句,为了将反斜杠与正则表达式匹配,你需要在一个原始字符串中使用\ (r'\ '\ ')或者在一个常规的字符串中使用4个反斜杠('\\\\ \\\\')。建议使用原始字符串文字在Python中定义regex模式。

#1


3  

You may use

你可以用

import re
my_string = "Hey, I'm 9/11 7-11"
print(re.findall(r'\w+|[^\w\s]', my_string))
# => ['Hey', ',', 'I', "'", 'm', '9', '/', '11', '7', '-', '11']

See the Python demo

看到Python演示

The \w+|[^\w\s] regex matches either 1+ word chars (letters, digits, _ symbols) or a single character other than a word and whitespace char.

\ w + |[^ \ w \ s]正则表达式匹配要么1 +单词字符(字母、数字、_符号)或一个字符以外的一个字,空格字符。

BTW, to match a backslash with a regex, you need to use \\ in a raw string literal (r'\\') or 4 backslashes in a regular one ('\\\\'). It is recommended to use raw string literals to define a regex pattern in Python.

顺便说一句,为了将反斜杠与正则表达式匹配,你需要在一个原始字符串中使用\ (r'\ '\ ')或者在一个常规的字符串中使用4个反斜杠('\\\\ \\\\')。建议使用原始字符串文字在Python中定义regex模式。