使用re.split和pattern在Python中使用正则表达式

时间:2022-05-10 03:53:20

I have a string like this:

我有一个像这样的字符串:

string ='ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'

string ='ArcelorMittal投资= E2 = 82 = AC87m在新流程中减少排放= 20'

I want to take out =E2=82=AC and =20

我想取出= E2 = 82 = AC和= 20

But when I use,

但是当我使用时,

pattern ='(=\w\w)+'
a=re.split(pattern,string)

it returns

它返回

['ArcelorMittal invests ', '=AC', '87m in new process that cuts emissions', '=20', '']

2 个解决方案

#1


1  

You may use re.findall

你可以使用re.findall

>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> re.findall(r'(?:=\w{2})+', s)
['=E2=82=AC', '=20']
>>> 

Use re.sub if you want to remove those chars.

如果要删除这些字符,请使用re.sub。

>>> re.sub(r'(?:=\w{2})+', '', s)
'ArcelorMittal invests 87m in new process that cuts emissions'

#2


1  

Based on your comment I would recommend you to use quopri.decodestring on original string. There is no need to extract these characters and decode them separately

根据您的评论,我建议您在原始字符串上使用quopri.decodestring。无需提取这些字符并单独解码它们

>>> import quopri
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> quopri.decodestring(s)
'ArcelorMittal invests \xe2\x82\xac87m in new process that cuts emissions '
>>> print quopri.decodestring(s)
ArcelorMittal invests €87m in new process that cuts emissions

#1


1  

You may use re.findall

你可以使用re.findall

>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> re.findall(r'(?:=\w{2})+', s)
['=E2=82=AC', '=20']
>>> 

Use re.sub if you want to remove those chars.

如果要删除这些字符,请使用re.sub。

>>> re.sub(r'(?:=\w{2})+', '', s)
'ArcelorMittal invests 87m in new process that cuts emissions'

#2


1  

Based on your comment I would recommend you to use quopri.decodestring on original string. There is no need to extract these characters and decode them separately

根据您的评论,我建议您在原始字符串上使用quopri.decodestring。无需提取这些字符并单独解码它们

>>> import quopri
>>> s = 'ArcelorMittal invests =E2=82=AC87m in new process that cuts emissions=20'
>>> quopri.decodestring(s)
'ArcelorMittal invests \xe2\x82\xac87m in new process that cuts emissions '
>>> print quopri.decodestring(s)
ArcelorMittal invests €87m in new process that cuts emissions