Python入门(06) -- 正则表达式

时间:2022-06-24 22:16:16

1 原子
(1) 普通字符作为原子

 import re
 pattern = "baidu"
 string = "www.baidu.com"
 result = re.search(pattern, string)
 print(result)

打印结果:

 <_sre.SRE_Match object; span=(4, 9), match='baidu'>

(2) 非打印字符作为原子

 import re
 pattern = '\n'
 string = """www.baidu.com 2017-12-16 """
 result = re.search(pattern, string)
 print(result)

打印结果:

 <_sre.SRE_Match object; span=(13, 14), match='\n'>

(3) 通用字符作为原子表

 import re
 pattern = "\w\dpython\w"
 string = "abc333python_py"
 result = re.search(pattern, string)
 print(result)

打印结果:

 <_sre.SRE_Match object; span=(4, 13), match='33python_'>

说明:

字符 解释
\w 匹配字母、数字及下划
\W 匹配非字母、数字及下划线
\s 匹配任意非打印字符,等价于 [\t\n\r\f]
\S 匹配任意非空字符
\d 匹配任意数字,等价于 [0-9]
\D 匹配任意非数字
\A 匹配字符串开始
\Z 匹配字符串结束,如果是存在换行,只匹配到换行前的结束字符串
\z 匹配字符串结束
\G 匹配最后匹配完成的位置
\b 匹配一个单词边界,也就是指单词和空格间的位置
\B 匹配非单词边界。’er\B’ 能匹配 “verb” 中的 ‘er’,但不能匹配 “never” 中的 ‘er’
\n、\t等 匹配一个非打印字符
\1…\9 匹配第n个分组的内容
\10 匹配第n个分组的内容,如果它经匹配。否则指的是八进制字符码的表达式。

(4) 原子表

import re
string = "abc123pythonp_py"
pattern1 = "\w\dpython[a-z]\w"
pattern2 = "\w\dpython[^a-z]\w"
pattern3 = "\w\dpython[a-z]\W"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
print(result1)
print(result2)
print(result3)

打印结果:

 <_sre.SRE_Match object; span=(4, 14), match='23pythonp_'>
 None
 None

2 元字符
(1) 任意匹配元字符

import re
pattern = "...Python."
string = "ILove123Python_py"
print(re.search(pattern, string))

(2) 边界限定元字符

import re
"""匹配以ILove开始的字符串"""
pattern1 = "^ILove"
"""匹配以Love开始的字符串"""
pattern2 = "^Love"
"""匹配以py结束的字符串"""
pattern3 = "py$"
"""匹配以ny结束的字符串"""
pattern4 = "ny$"
string = "ILove123Python_py"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
result4 = re.search(pattern4, string)
print(result1)
print(result2)
print(result3)
print(result4)

打印结果:

<_sre.SRE_Match object; span=(0, 5), match='ILove'>
None
<_sre.SRE_Match object; span=(15, 17), match='py'>
None

(3) 限定符

import re
string = "ILoveAndccccc123Python_py"
pattern1 = "Py.*n"
"""匹配从o后的两个v"""
pattern2 = "dc{2}"
"""匹配从o后的三个v"""
pattern3 = "dc{3}"
"""匹配从o后的最少两个v"""
pattern4 = "dc{2,}"
result1 = re.search(pattern1, string)
result2 = re.search(pattern2, string)
result3 = re.search(pattern3, string)
result4 = re.search(pattern4, string)
print(result1)
print(result2)
print(result3)
print(result4)

打印结果:

<_sre.SRE_Match object; span=(16, 22), match='Python'>
<_sre.SRE_Match object; span=(7, 10), match='dcc'>
<_sre.SRE_Match object; span=(7, 11), match='dccc'>
<_sre.SRE_Match object; span=(7, 13), match='dccccc'>

(4) 模式选择符

import re
string = "ILoveAndccccc123Python_py"
pattern = "Love|Python"
print(re.search(pattern, string))

打印结果:

<_sre.SRE_Match object; span=(1, 5), match='Love'>

(5) 模式单元符

import re
pattern1 = "(cd){1,}"
pattern2 = "cd{1,}"
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)  
result2 = re.search(pattern2, string)
print(result1)
print(result2)

打印结果:

<_sre.SRE_Match object; span=(2, 8), match='cdcdcd'>
<_sre.SRE_Match object; span=(2, 4), match='cd'>

3 模式修正

import re
pattern1 = "python"
pattern2 = "python"
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)  
result2 = re.search(pattern2, string, re.I)
print(result1)
print(result2)

打印结果:

None
<_sre.SRE_Match object; span=(9, 15), match='Python'>

4 贪婪模式和懒惰模式
1) 贪婪模式:尽可能多的匹配, 找到最后一个y为止
2) 懒惰模式:尽可能少的匹配,找到第一个y为止

import re 
pattern1 = "P.*y"   #贪婪模式
pattern2 = "P.*?y"  #懒惰模式
string = "abcdcdcdePython_py"
result1 = re.search(pattern1, string)  
result2 = re.search(pattern2, string)
print(result1)
print(result2)

打印结果:

<_sre.SRE_Match object; span=(9, 18), match='Python_py'>
<_sre.SRE_Match object; span=(9, 11), match='Py'>