正则表达式findall在python3中产生奇怪的结果

时间:2022-09-28 04:05:10

I want to find all the docblocks of a string using python. My first attempt was this:

我想使用python找到字符串的所有docblock。我的第一次尝试是这样的:

b = re.compile('\/\*(.)*?\*/', re.M|re.S)
match = b.search(string)
print(match.group(0))

And that worked, but as you'll notice yourself: it'll only print out 1 docblock, not all of them.

这很有效,但正如你会注意到的那样:它只打印出1个docblock,而不是全部。

So I wanted to use the findall function, which says it would output all the matches, like this:

所以我想使用findall函数,它表示会输出所有的匹配,如下所示:

b = re.compile('\/\*(.)*?\*/', re.M|re.S)
match = b.findall(string)
print(match)

But I never get anything useful, only these kinds of arrays:

但我从来没有得到任何有用的东西,只有这些类型的数组:

[' ', ' ', ' ', '\t', ' ', ' ', ' ', ' ', ' ', '\t', ' ', ' ', ' ']

The documentation does say it'll return empty strings, but I don't know how this can be useful.

该文档确实说它将返回空字符串,但我不知道它是如何有用的。

2 个解决方案

#1


2  

You need to move the quatifier inside the capture group:

您需要在捕获组内移动quatifier:

b = re.compile('\/\*(.*?)\*/', re.M|re.S)

#2


1  

To expand a bit on Rohit Jain's (correct) answer, with the qualifier outside the parentheses you're saying "match (non-greedily) any number of the one character inside the parens, and capture that one character". In other words, it would match " " or "aaaaaa", but in "abcde" it would only match the "a". (And since it's non-greedy, even in "aaaaaa" it would only match a single "a"). By moving the qualifier inside the parens (that is, (.*?) instead of what you had before) you're now saying "match any number of characters, and capture all of them".

为了扩展Rohit Jain的(正确)答案,使用括号外的限定符,你会说“匹配(非贪婪)任意数量的parens中的一个字符,并捕获那个字符”。换句话说,它将匹配“”或“aaaaaa”,但在“abcde”中它只匹配“a”。 (因为它不贪婪,即使在“aaaaaa”中也只能匹配一个“a”)。通过移动parens中的限定符(即(。*?)而不是之前的那个),你现在说“匹配任意数量的字符,并捕获所有字符”。

I hope this helps you understand what's going on a bit better.

我希望这可以帮助您了解更好的情况。

#1


2  

You need to move the quatifier inside the capture group:

您需要在捕获组内移动quatifier:

b = re.compile('\/\*(.*?)\*/', re.M|re.S)

#2


1  

To expand a bit on Rohit Jain's (correct) answer, with the qualifier outside the parentheses you're saying "match (non-greedily) any number of the one character inside the parens, and capture that one character". In other words, it would match " " or "aaaaaa", but in "abcde" it would only match the "a". (And since it's non-greedy, even in "aaaaaa" it would only match a single "a"). By moving the qualifier inside the parens (that is, (.*?) instead of what you had before) you're now saying "match any number of characters, and capture all of them".

为了扩展Rohit Jain的(正确)答案,使用括号外的限定符,你会说“匹配(非贪婪)任意数量的parens中的一个字符,并捕获那个字符”。换句话说,它将匹配“”或“aaaaaa”,但在“abcde”中它只匹配“a”。 (因为它不贪婪,即使在“aaaaaa”中也只能匹配一个“a”)。通过移动parens中的限定符(即(。*?)而不是之前的那个),你现在说“匹配任意数量的字符,并捕获所有字符”。

I hope this helps you understand what's going on a bit better.

我希望这可以帮助您了解更好的情况。