Python正则表达式将字符串拆分为数字和文本/符号。

时间:2022-02-06 02:15:45

I would like to split a string into sections of numbers and sections of text/symbols my current code doesn't include negative numbers or decimals, and behaves weirdly, adding an empty list element on the end of the output

我想将一个字符串分割成数字部分和文本/符号部分,我当前的代码不包含负数或小数,并且奇怪地在输出末尾添加一个空的列表元素

import re
mystring = 'AD%5(6ag 0.33--9.5'
newlist = re.split('([0-9]+)', mystring)
print (newlist)

current output:

电流输出:

['AD%', '5', '(', '6', 'ag ', '0', '.', '33', '--', '9', '.', '5', '']

desired output:

期望的输出:

['AD%', '5', '(', '6', 'ag ', '0.33', '-', '-9.5']

3 个解决方案

#1


1  

Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.

您的问题与您的regex捕获一个或多个数字并将其添加到结果列表和数字作为分隔符的事实有关,这些部分在考虑之前和之后。因此,如果末尾有数字,那么分割后的空字符串将被添加到结果列表中。

You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:

您可以使用regex将浮点数或整数与可选减号匹配,然后删除空值:

result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)

To match negative/positive numbers with exponents, use

用指数来匹配负数/正数。

r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'

The -?\d*\.?\d+ regex matches:

- \ \ d *。?\ d +正则表达式匹配:

  • -? - an optional minus
  • - - - - - - ?——一个可选的-
  • \d* - 0+ digits
  • \ d * - 0 +数字
  • \.? - an optional literal dot
  • \。-可选的文字点
  • \d+ - one or more digits.
  • 一个或多个数字。

#2


1  

Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:

不幸的是,re.split()不提供“忽略空字符串”选项。但是,要检索您的数字,可以使用re.findall(),使用不同的模式:

import re

string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)

print(numbers)
# ['5', '6', '0.33', '-9.5']

#3


1  

As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:

如前所述,在re.split()中不能忽略空字符串,但是可以通过以下方式轻松构造一个新列表:

import re

mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist

output:

输出:

['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']

#1


1  

Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.

您的问题与您的regex捕获一个或多个数字并将其添加到结果列表和数字作为分隔符的事实有关,这些部分在考虑之前和之后。因此,如果末尾有数字,那么分割后的空字符串将被添加到结果列表中。

You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:

您可以使用regex将浮点数或整数与可选减号匹配,然后删除空值:

result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)

To match negative/positive numbers with exponents, use

用指数来匹配负数/正数。

r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'

The -?\d*\.?\d+ regex matches:

- \ \ d *。?\ d +正则表达式匹配:

  • -? - an optional minus
  • - - - - - - ?——一个可选的-
  • \d* - 0+ digits
  • \ d * - 0 +数字
  • \.? - an optional literal dot
  • \。-可选的文字点
  • \d+ - one or more digits.
  • 一个或多个数字。

#2


1  

Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:

不幸的是,re.split()不提供“忽略空字符串”选项。但是,要检索您的数字,可以使用re.findall(),使用不同的模式:

import re

string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)

print(numbers)
# ['5', '6', '0.33', '-9.5']

#3


1  

As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:

如前所述,在re.split()中不能忽略空字符串,但是可以通过以下方式轻松构造一个新列表:

import re

mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist

output:

输出:

['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']