python学习之re 7 [] 集合操作

时间:2022-01-08 06:34:38

[]

Used to indicate a set of characters. In a set:

  • Characters can be listed individually, e.g. [amk] will match 'a''m', or 'k'.
  • Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a\-z]) or if it’s placed as the first or last character (e.g. [-a] or [a-]), it will match a literal '-'.
  • Special characters lose their special meaning inside sets. For example, [(+*)] will match any of the literal characters '(''+''*', or ')'.
  • Character classes such as \w or \S (defined below) are also accepted inside a set, although the characters they match depends on whether ASCII or LOCALE mode is in force.
  • Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5] will match any character except '5', and [^^] will match any character except '^'^ has no special meaning if it’s not the first character in the set.
  • To match a literal ']' inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both [()[\]{}] and []()[{}] will both match a parenthesis.
  • Support of nested sets and set operations as in Unicode Technical Standard #18 might be added in the future. This would change the syntax, so to facilitate this change a FutureWarning will be raised in ambiguous cases for the time being. That include sets starting with a literal '[' or containing literal character sequences '--''&&''~~', and '||'. To avoid a warning escape them with a backslash.

Changed in version 3.7: FutureWarning is raised if a character set contains constructs that will change semantically in the future.

与其他集合相比,这个集合是自定义的集合,已经如\d代表是0-9的结合

1.可以单独的列出所有元素 如 [1abef]

import re
string1 ="abacasdfsdafasdbadfasdfsdfasdfdsvadzxcvsdasfazxvxc"
print(len(string1),"stringr",re.match("[asbgdfzxcv]*",string1))

输出

50 stringr <re.Match object; span=(0, 50), match='abacasdfsdafasdbadfasdfsdfasdfdsvadzxcvsdasfazxvx>

2.可以通过短横线-来设置区域,表示字符之间的起始位置,如果想要插入短横线需要使用反斜杠来进行,如果短横线是第一个或者最后一个,将会匹配被理解为减号。同样的  如果列举的数据中有字符 ']' 需要用下划线来表示这个字符是转义字符。

import re
string1 ="abacasdfsdafasdbadfasdfsdfasdfdsvadzxcvsdasfazxvxc"
print(len(string1),"stringr",re.match("[a-z]*",string1))

输出

50 stringr <re.Match object; span=(0, 50), match='abacasdfsdafasdbadfasdfsdfasdfdsvadzxcvsdasfazxvx>

3.许多的特殊标识符在集合中将会失去其本来的含义。

import re
string1 = "*^+()"
print(len(string1),"stringr",re.match("[(*^+)]*",string1))

输出

5 stringr <re.Match object; span=(0, 5), match='*^+()'>

4.Character classes such as \w or \S (defined below) are also accepted inside a set, although the characters they match depends on whether ASCII or LOCALE mode is in force.字符类集合如 \w \S都可以放在集合里面,尽管匹配这些字符依赖于他们是ASCII或者LOCALE模式。

Make \w\W\b\B and case-insensitive matching dependent on the current locale. This flag can be used only with bytes patterns. The use of this flag is discouraged as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages. Corresponds to the inline flag (?L).

5.通过前置 标识符 ^ 来对集合里面列举的数据进行取反。 

import re
string1 = "*^+()"
print(len(string1),"stringr",re.match("[^(*^+)]*",string1))
print(len(string1),"stringr",re.match("[^(+)]*",string1))

输出

5 stringr <re.Match object; span=(0, 0), match=''>
5 stringr <re.Match object; span=(0, 2), match='*^'>