如何使用正则表达式用空格替换字符之间的短划线

时间:2022-07-07 16:50:23

I want to replace dashes which appear between letters with a space using regex. For example to replace ab-cd with ab cd

我想用正则表达式替换出现在带有空格的字母之间的破折号。例如,用ab cd替换ab-cd

The following matches the character-character sequence, however also replaces the characters [i.e. ab-cd results in a d, rather than ab cd as i desire]

以下匹配字符 - 字符序列,但也替换字符[即ab-cd导致d,而不是ab cd,因为我希望]

 new_term = re.sub(r"[A-z]\-[A-z]", " ", original_term)

How i adapt the above to only replace the - part?

我如何调整以上只能替换 - 部分?

4 个解决方案

#1


6  

You need to capture the characters before and after the - to a group and use them for replacement, i.e.:

您需要捕获 - 组之前和之后的字符并将其用于替换,即:

import re
subject = "ab-cd"
subject = re.sub(r"([a-z])\-([a-z])", r"\1 \2", subject , 0, re.IGNORECASE)
print subject
#ab cd

DEMO

DEMO

http://ideone.com/LAYQWT

http://ideone.com/LAYQWT


REGEX EXPLANATION

REGEX EXPLANATION

([A-z])\-([A-z])

Match the regex below and capture its match into backreference number 1 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»
Match the character “-” literally «\-»
Match the regex below and capture its match into backreference number 2 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»

\1 \2

Insert the text that was last matched by capturing group number 1 «\1»
Insert the character “ ” literally « »
Insert the text that was last matched by capturing group number 2 «\2»

#2


6  

Use references to capturing groups:

使用对捕获组的引用:

>>> original_term = 'ab-cd'
>>> re.sub(r"([A-z])\-([A-z])", r"\1 \2", original_term)
'ab cd'

This assumes, of course, that you can't just do original_term.replace('-', ' ') for whatever reason. Perhaps your text uses hyphens where it should use en dashes or something.

当然,这假设您不能出于任何原因而执行original_term.replace(' - ','')。也许你的文本使用连字符,它应该使用短划线或其他东西。

#3


2  

re.sub() always replaces the whole matched sequence with the replacement.

re.sub()总是用替换替换整个匹配的序列。

A solution to only replace the dash are lookahead and lookbehind assertions. They don't count to the matched sequence.

仅替换破折号的解决方案是前瞻性和后瞻性断言。它们不计入匹配的序列。

new_term = re.sub(r"(?<=[A-z])\-(?=[A-z])", " ", original_term)

The syntax is explained in the Python documentation for the re module.

re模块的Python文档中解释了该语法。

#4


1  

You need to use look-arounds:

你需要使用环视:

 new_term = re.sub(r"(?i)(?<=[A-Z])-(?=[A-Z])", " ", original_term)

Or capturing groups:

或捕获组:

 new_term = re.sub(r"(?i)([A-Z])-([A-Z])", r"\1 \2", original_term)

See IDEONE demo

请参阅IDEONE演示

Note that [A-z] also matches some non-letters (namely [, \, ], ^, _, and `), thus, I suggest replacing it with [A-Z] and use a case-insensitive modifier (?i).

注意[A-z]也匹配一些非字母(即[,\,],^,_和`),因此,我建议用[A-Z]替换它并使用不区分大小写的修饰符(?i)。

Note that you do not have to escape a hyphen outside a character class.

请注意,您不必转义字符类之外的连字符。

#1


6  

You need to capture the characters before and after the - to a group and use them for replacement, i.e.:

您需要捕获 - 组之前和之后的字符并将其用于替换,即:

import re
subject = "ab-cd"
subject = re.sub(r"([a-z])\-([a-z])", r"\1 \2", subject , 0, re.IGNORECASE)
print subject
#ab cd

DEMO

DEMO

http://ideone.com/LAYQWT

http://ideone.com/LAYQWT


REGEX EXPLANATION

REGEX EXPLANATION

([A-z])\-([A-z])

Match the regex below and capture its match into backreference number 1 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»
Match the character “-” literally «\-»
Match the regex below and capture its match into backreference number 2 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»

\1 \2

Insert the text that was last matched by capturing group number 1 «\1»
Insert the character “ ” literally « »
Insert the text that was last matched by capturing group number 2 «\2»

#2


6  

Use references to capturing groups:

使用对捕获组的引用:

>>> original_term = 'ab-cd'
>>> re.sub(r"([A-z])\-([A-z])", r"\1 \2", original_term)
'ab cd'

This assumes, of course, that you can't just do original_term.replace('-', ' ') for whatever reason. Perhaps your text uses hyphens where it should use en dashes or something.

当然,这假设您不能出于任何原因而执行original_term.replace(' - ','')。也许你的文本使用连字符,它应该使用短划线或其他东西。

#3


2  

re.sub() always replaces the whole matched sequence with the replacement.

re.sub()总是用替换替换整个匹配的序列。

A solution to only replace the dash are lookahead and lookbehind assertions. They don't count to the matched sequence.

仅替换破折号的解决方案是前瞻性和后瞻性断言。它们不计入匹配的序列。

new_term = re.sub(r"(?<=[A-z])\-(?=[A-z])", " ", original_term)

The syntax is explained in the Python documentation for the re module.

re模块的Python文档中解释了该语法。

#4


1  

You need to use look-arounds:

你需要使用环视:

 new_term = re.sub(r"(?i)(?<=[A-Z])-(?=[A-Z])", " ", original_term)

Or capturing groups:

或捕获组:

 new_term = re.sub(r"(?i)([A-Z])-([A-Z])", r"\1 \2", original_term)

See IDEONE demo

请参阅IDEONE演示

Note that [A-z] also matches some non-letters (namely [, \, ], ^, _, and `), thus, I suggest replacing it with [A-Z] and use a case-insensitive modifier (?i).

注意[A-z]也匹配一些非字母(即[,\,],^,_和`),因此,我建议用[A-Z]替换它并使用不区分大小写的修饰符(?i)。

Note that you do not have to escape a hyphen outside a character class.

请注意,您不必转义字符类之外的连字符。