计算regex通配符匹配的字符串中特定字符的匹配数

时间:2022-09-13 09:27:36

Can I keep a count of each different character matched in the regex itself ?

我可以对regex本身中的每个不同字符进行计数吗?

Suppose the regex goes looks like />(.*)[^a]+/

假设regex是看起来像/ >(. *)[^]+ /

Can I keep a count of the occurrences of, say the letter p in the string captured by the group (.*)?

我可以记录发生的事件,比如group(.*)捕获的字符串中的字母p吗?

6 个解决方案

#1


5  

You would have to capture the string matched and process it separately.

您必须捕获匹配的字符串并分别处理它。

This code demonstrates

这段代码演示了

use strict;
use warnings;

my $str = '> plantagenetgoosewagonattributes';

if ($str =~ />(.*)[^a]+/) {
  my $substr = $1;
  my %counts;
  $counts{$_}++ for $substr =~ /./g;
  print "'$_' - $counts{$_}\n" for sort keys %counts;
}

output

输出

' ' - 1
'a' - 4
'b' - 1
'e' - 4
'g' - 3
'i' - 1
'l' - 1
'n' - 3
'o' - 3
'p' - 1
'r' - 1
's' - 1
't' - 5
'u' - 1
'w' - 1

#2


5  

Outside of the regex :

在regex之外:

my $p_count = map /p/g, />(.*)[^a]/;

Self-contained:

包含:

local our $p_count;
/
   (?{ 0 })
   >
   (?: p (?{ $^R + 1 })
   |   [^p]
   )*
   [^a]
   (?{ $p_count = $^R; })
/x;

In both cases, you can easily expand this to count all letters. For example,

在这两种情况下,都可以轻松地展开它来计算所有的字母。例如,

my %counts;
if (my ($seq = />(.*)[^a]/) {
   ++$counts{$_} for split //, $seq;
}

my $p_count = $counts{'p'};

#3


3  

AFAIK, you can't. You can only capture some group by parentheses and later check the length of data captured by that group.

AFAIK,你不能。您只能用括号捕获一些组,然后检查该组捕获的数据的长度。

#4


3  

Going along the lines of Borodin's solution , here is a pure bash one :

按照Borodin的解决方案,这里有一个纯粹的bash解决方案:

let count=0  
testarray=(a b c d e f g h i j k l m n o p q r s t u v w x y z) 

string="> plantagenetgoosewagonattributes"                 # the string 
pattern=">(.*)[^a]+"                                   # regex pattern

limitvar=${#testarray[@]}                                  #array length

[[ $string =~ $pattern ]] && 
( while [ $count -lt $limitvar ] ; do sub="${BASH_REMATCH[1]//[^${testarray[$count]}]}" ; echo "${testarray[$count]} = ${#sub}" ; ((count++)) ; done )

Staring from bash 3.0 , bash has introduced the capture groups which can be accessed through BASH_REMATCH[n].

从bash 3.0开始,bash引入了可以通过BASH_REMATCH访问的捕获组[n]。

The Solution declares the characters to be counted as arrays [ Check out declare -a for array declaraton in complex cases] .A single character count would require no count variables ,no while construct but a variable for the character instead of an array .

解决方案将字符声明为数组[在复杂情况下检查数组声明-a] .单个字符计数将不需要计数变量,不需要构造,而需要字符的变量而不是数组。

If you are including ranges as in the code above , this array declaration does the exact thing .

如果您像上面的代码一样包含范围,那么这个数组声明将执行完全相同的操作。

testarray=(`echo {a..z}`)

An introduction of an if loop will account for the display of 0 count characters . I wanted to keep the solution as simple as possible .

一个if循环的介绍将说明0计数字符的显示。我想让解决方案尽可能简单。

#5


2  

There is the experimental, don't-use-me, (?{ code }) construct...

这是实验性的,不要用我,(?{代码})构造……

From man perlre:

从人perlre:参考

"(?{ code })" WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine.

“(?{代码})”警告:这个扩展的正则表达式特性被认为是实验性的,并且可以在没有通知的情况下进行更改。由于regex引擎中未来优化的影响,具有副作用的代码在版本之间的执行可能不相同。

If that didn't scare you off, here's an example that counts the number of "p"s

如果这没有吓到你,这里有一个计算p的数目的例子

my $p_count;
">pppppbca" =~ /(?{ $p_count = 0 })>(p(?{$p_count++})|.)*[^a]+/;
print "$p_count\n";

#6


0  

First a remark: Due to the greediness of *, the last [^a]+ will never match more than one non-a character -- i.e., you might as well drop the +.

第一句话:*由于贪吃,最后[^]+永远不会匹配多个许可字符,即。,你不妨去掉+。

And as @mvf said, you need to capture the string that the wildcard matches to be able to count the characters in it. Perl regular expressions do not have a way to return a count of how many times a specific group matches -- the engine probably keeps the number around to support the {,n} mechanism, but you can't get at it.

正如@mvf所说,您需要捕获通配符匹配的字符串,以便能够计数其中的字符。Perl正则表达式没有方法返回特定组匹配次数的计数——引擎可能会保留数字以支持{,n}机制,但您无法获得它。

#1


5  

You would have to capture the string matched and process it separately.

您必须捕获匹配的字符串并分别处理它。

This code demonstrates

这段代码演示了

use strict;
use warnings;

my $str = '> plantagenetgoosewagonattributes';

if ($str =~ />(.*)[^a]+/) {
  my $substr = $1;
  my %counts;
  $counts{$_}++ for $substr =~ /./g;
  print "'$_' - $counts{$_}\n" for sort keys %counts;
}

output

输出

' ' - 1
'a' - 4
'b' - 1
'e' - 4
'g' - 3
'i' - 1
'l' - 1
'n' - 3
'o' - 3
'p' - 1
'r' - 1
's' - 1
't' - 5
'u' - 1
'w' - 1

#2


5  

Outside of the regex :

在regex之外:

my $p_count = map /p/g, />(.*)[^a]/;

Self-contained:

包含:

local our $p_count;
/
   (?{ 0 })
   >
   (?: p (?{ $^R + 1 })
   |   [^p]
   )*
   [^a]
   (?{ $p_count = $^R; })
/x;

In both cases, you can easily expand this to count all letters. For example,

在这两种情况下,都可以轻松地展开它来计算所有的字母。例如,

my %counts;
if (my ($seq = />(.*)[^a]/) {
   ++$counts{$_} for split //, $seq;
}

my $p_count = $counts{'p'};

#3


3  

AFAIK, you can't. You can only capture some group by parentheses and later check the length of data captured by that group.

AFAIK,你不能。您只能用括号捕获一些组,然后检查该组捕获的数据的长度。

#4


3  

Going along the lines of Borodin's solution , here is a pure bash one :

按照Borodin的解决方案,这里有一个纯粹的bash解决方案:

let count=0  
testarray=(a b c d e f g h i j k l m n o p q r s t u v w x y z) 

string="> plantagenetgoosewagonattributes"                 # the string 
pattern=">(.*)[^a]+"                                   # regex pattern

limitvar=${#testarray[@]}                                  #array length

[[ $string =~ $pattern ]] && 
( while [ $count -lt $limitvar ] ; do sub="${BASH_REMATCH[1]//[^${testarray[$count]}]}" ; echo "${testarray[$count]} = ${#sub}" ; ((count++)) ; done )

Staring from bash 3.0 , bash has introduced the capture groups which can be accessed through BASH_REMATCH[n].

从bash 3.0开始,bash引入了可以通过BASH_REMATCH访问的捕获组[n]。

The Solution declares the characters to be counted as arrays [ Check out declare -a for array declaraton in complex cases] .A single character count would require no count variables ,no while construct but a variable for the character instead of an array .

解决方案将字符声明为数组[在复杂情况下检查数组声明-a] .单个字符计数将不需要计数变量,不需要构造,而需要字符的变量而不是数组。

If you are including ranges as in the code above , this array declaration does the exact thing .

如果您像上面的代码一样包含范围,那么这个数组声明将执行完全相同的操作。

testarray=(`echo {a..z}`)

An introduction of an if loop will account for the display of 0 count characters . I wanted to keep the solution as simple as possible .

一个if循环的介绍将说明0计数字符的显示。我想让解决方案尽可能简单。

#5


2  

There is the experimental, don't-use-me, (?{ code }) construct...

这是实验性的,不要用我,(?{代码})构造……

From man perlre:

从人perlre:参考

"(?{ code })" WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine.

“(?{代码})”警告:这个扩展的正则表达式特性被认为是实验性的,并且可以在没有通知的情况下进行更改。由于regex引擎中未来优化的影响,具有副作用的代码在版本之间的执行可能不相同。

If that didn't scare you off, here's an example that counts the number of "p"s

如果这没有吓到你,这里有一个计算p的数目的例子

my $p_count;
">pppppbca" =~ /(?{ $p_count = 0 })>(p(?{$p_count++})|.)*[^a]+/;
print "$p_count\n";

#6


0  

First a remark: Due to the greediness of *, the last [^a]+ will never match more than one non-a character -- i.e., you might as well drop the +.

第一句话:*由于贪吃,最后[^]+永远不会匹配多个许可字符,即。,你不妨去掉+。

And as @mvf said, you need to capture the string that the wildcard matches to be able to count the characters in it. Perl regular expressions do not have a way to return a count of how many times a specific group matches -- the engine probably keeps the number around to support the {,n} mechanism, but you can't get at it.

正如@mvf所说,您需要捕获通配符匹配的字符串,以便能够计数其中的字符。Perl正则表达式没有方法返回特定组匹配次数的计数——引擎可能会保留数字以支持{,n}机制,但您无法获得它。