如何检查Perl数组是否包含特定值?

时间:2022-11-16 02:11:18

I am trying to figure out a way of checking for the existence of a value in an array without iterating through the array.

我试图找到一种方法,在不遍历数组的情况下检查数组中是否存在值。

I am reading a file for a parameter. I have a long list of parameters I do not want to deal with. I placed these unwanted parameters in an array @badparams.

我正在读取一个文件以获取一个参数。我有一长串我不想处理的参数。我将这些不需要的参数放在@badparams数组中。

I want to read a new parameter and if it does not exist in @badparams, process it. If it does exist in @badparams, go to the next read.

我想读取一个新的参数,如果它不存在于@badparams中,那么就处理它。如果它确实存在于@badparams中,请转到下一篇文章。

11 个解决方案

#1


161  

Simply turn the array into a hash:

只需将数组转换为散列即可:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

您还可以向列表中添加更多(唯一)参数:

$params{$newparam} = 1;

And later get a list of (unique) params back:

然后得到一个(唯一的)params列表:

@badparams = keys %params;

#2


191  

Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

最佳一般用途——特别是短数组(1000个项目或更少)和编码人员,他们不确定哪种优化最适合他们的需要。

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you're talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you're talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

有人提到,grep遍历所有值,即使数组中的第一个值匹配。这是事实,但是grep在大多数情况下仍然非常快。如果你说的是短数组(少于1000个项目),那么大多数算法都会很快。如果您谈论的是非常长的数组(1,000,000项),那么grep的速度是可以接受的,无论该项是数组中的第一个还是中间的,还是最后一个。

Optimization Cases for longer arrays:

长数组的优化案例:

If your array is sorted, use a "binary search".

如果数组已排序,请使用“二进制搜索”。

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

如果重复搜索同一数组多次,首先将其复制到散列中,然后检查散列。如果需要考虑内存,那么将数组中的每个项移动到散列中。提高内存效率,但破坏原始数组。

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we'll find it in the hash and skip the search).

如果在数组中重复搜索相同的值,则延迟构建缓存。(在搜索每个条目时,首先检查搜索结果是否存储在一个持久化散列中。如果在散列中没有找到搜索结果,那么搜索数组并将结果放在持久化散列中,以便下次在散列中找到结果并跳过搜索)。

Note: these optimizations will only be faster when dealing with long arrays. Don't over optimize.

注意:只有在处理长数组时,这些优化才会更快。不要过度优化。

#3


98  

You can use smartmatch feature in Perl 5.10 as follows:

您可以在Perl 5.10中使用smartmatch特性:

For literal value lookup doing below will do the trick.

对于字面值查找,执行以下操作将达到目的。

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

对于标量查找,执行以下操作与上面一样。

if ($val ~~ @array)

For inline array doing below, will work as above.

对于下面进行的内联数组,将像上面一样工作。

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

在Perl 5.18 smartmatch中,它被标记为实验性的,因此您需要通过在脚本/模块中添加以下内容来打开实验性的pragma来关闭警告:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

或者,如果你想避免使用smartmatch——那么正如Aaron所说:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}

#4


36  

This blog post discusses the best answers to this question.

这篇博文讨论了这个问题的最佳答案。

As a short summary, if you can install CPAN modules then the most readable solutions are:

作为一个简短的总结,如果您可以安装CPAN模块,那么最易读的解决方案是:

any(@ingredients) eq 'flour';

or

@ingredients->contains('flour');

However, a more common idiom is:

然而,一个更常见的习语是:

any { $_ eq 'flour' } @ingredients

But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the ~~ "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list.

但是请不要使用first()函数!它根本没有表达代码的意图。不要使用~~“智能火柴”操作员:它坏了。并且不要使用grep()或带有散列的解决方案:它们遍历整个列表。

any() will stop as soon as it finds your value.

任何()只要找到你的价值就会停止。

Check out the blog post for more details.

查看博客文章了解更多细节。

#5


11  

Even though it's convenient to use, it seems like the convert-to-hash solution costs quite a lot of performance, which was an issue for me.

尽管使用起来很方便,但似乎转换到哈希的解决方案花费了大量的性能,这对我来说是一个问题。

#!/usr/bin/perl
use Benchmark;
my @list;
for (1..10_000) {
    push @list, $_;
}

timethese(10000, {
  'grep'    => sub {
            if ( grep(/^5000$/o, @list) ) {
                # code
            }
        },
  'hash'    => sub {
            my %params = map { $_ => 1 } @list;
            if ( exists($params{5000}) ) {
                # code
            }
        },
});

Output of benchmark test:

输出的基准测试:

Benchmark: timing 10000 iterations of grep, hash...
          grep:  8 wallclock secs ( 7.95 usr +  0.00 sys =  7.95 CPU) @ 1257.86/s (n=10000)
          hash: 50 wallclock secs (49.68 usr +  0.01 sys = 49.69 CPU) @ 201.25/s (n=10000)

#6


10  

@eakssjo's benchmark is broken - measures creating hashes in loop vs creating regexes in loop. Fixed version (plus I've added List::Util::first and List::MoreUtils::any):

@eakssjo的基准被破坏了——度量在循环中创建散列,而在循环中创建正则表达式。固定版本(加上我添加了List::Util: first和List::MoreUtils::any):

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list;  # precompute hash
timethese(
    100_000, {
        'any' => sub {
            die unless ( any { $hit_regex } @list );
        },
        'first' => sub {
            die unless ( first { $hit_regex } @list );
        },
        'grep' => sub {
            die unless ( grep { $hit_regex } @list );
        },
        'hash' => sub {
            die unless ( $params{$hit} );
        },
    });

And result (it's for 100_000 iterations, ten times more than in @eakssjo's answer):

结果(10万次迭代,比@eakssjo的答案多10倍):

Benchmark: timing 100000 iterations of any, first, grep, hash...
       any:  0 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 149253.73/s (n=100000)
     first:  1 wallclock secs ( 0.63 usr +  0.01 sys =  0.64 CPU) @ 156250.00/s (n=100000)
      grep: 42 wallclock secs (41.95 usr +  0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
      hash:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)

#7


2  

You certainly want a hash here. Place the bad parameters as keys in the hash, then decide whether a particular parameter exists in the hash.

你肯定想要一个散列。将糟糕的参数作为键放在散列中,然后决定散列中是否存在特定的参数。

our %bad_params = map { $_ => 1 } qw(badparam1 badparam2 badparam3)

if ($bad_params{$new_param}) {
  print "That is a bad parameter\n";
}

If you are really interested in doing it with an array, look at List::Util or List::MoreUtils

如果你真的很想用一个数组来做,看看列表::Util或List::MoreUtils。

#8


2  

Method 1: grep (may careful while value is expected to be a regex).

Try to avoid using grep, if looking at resources.

如果查看参考资料,请尽量避免使用grep。

if ( grep( /^$value$/, @badparams ) ) {
  print "found";
}

Method 2: Linear Search

for (@badparams) {
    if ($_ eq $value) {
       print "found";
    }
}

Method 3: Use a hash

my %hash = map {$_ => 1} @badparams;
print "found" if (exists $hash{$value});

Method 4: smartmatch

(added in Perl 5.10, marked is experimental in Perl 5.18).

(添加在Perl 5.10中,标记在Perl 5.18中是实验性的)。

use experimental 'smartmatch';  # for perl 5.18
print "found" if ($value ~~ @badparams);

Method 5: Use core module List::MoreUtils

use List::MoreUtils qw(any uniq);;
@badparams = (1,2,3);
$value = 1;
print "found" if any {$_ eq $value} @badparams;

#9


0  

There are two ways you can do this. You can use the throw the values into a hash for a lookup table, as suggested by the other posts. ( I'll add just another idiom. )

有两种方法可以做到这一点。您可以像其他文章建议的那样,将值抛出到查找表的散列中。(我再加一个成语。)

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;

But if it's data of mostly word characters and not too many meta, you can dump it into a regex alternation:

但是,如果它的数据主要是单词字符,而不是太多的元数据,那么您可以将它转储到regex交替中:

use English qw<$LIST_SEPARATOR>;

my $regex_str = do { 
    local $LIST_SEPARATOR = '|';
    "(?:@bad_params)";
 };

 # $front_delim and $back_delim being any characters that come before and after. 
 my $regex = qr/$front_delim$regex_str$back_delim/;

This solution would have to be tuned for the types of "bad values" you're looking for. And again, it might be totally inappropriate for certain types of strings, so caveat emptor.

这个解决方案必须针对您正在寻找的“坏值”类型进行调优。再说一遍,它可能完全不适合某些类型的字符串,所以买者要小心。

#10


0  

@files is an existing array

@files是一个现有的数组

my @new_values =  grep(/^2[\d].[\d][A-za-z]?/,@files);

print join("\n", @new_values);

print "\n";

/^2[\d].[\d][A-za-z]?/ = vaues starting from 2 here you can put any regular expression

/ ^ 2 \[d]。[\ d][A-za-z]吗?/ = vaues,从2开始你可以写任何正则表达式

#11


-1  

my @badparams = (1,2,5,7,'a','zzz');

my $badparams = join('|',@badparams);   # '|' or any other character not present in params

foreach my $par (4,5,6,7,'a','z','zzz')
{
    if ($badparams =~ /\b$par\b/)
    {
        print "$par is present\n";
    }
    else
    {
        print "$par is not present\n";
    }
}

You may want to check for numerical leading spaces consistancy

您可能想要检查数值领先空间的一致性

#1


161  

Simply turn the array into a hash:

只需将数组转换为散列即可:

my %params = map { $_ => 1 } @badparams;

if(exists($params{$someparam})) { ... }

You can also add more (unique) params to the list:

您还可以向列表中添加更多(唯一)参数:

$params{$newparam} = 1;

And later get a list of (unique) params back:

然后得到一个(唯一的)params列表:

@badparams = keys %params;

#2


191  

Best general purpose - Especially short arrays (1000 items or less) and coders that are unsure of what optimizations best suit their needs.

最佳一般用途——特别是短数组(1000个项目或更少)和编码人员,他们不确定哪种优化最适合他们的需要。

# $value can be any regex. be safe
if ( grep( /^$value$/, @array ) ) {
  print "found it";
}

It has been mentioned that grep passes through all values even if the first value in the array matches. This is true, however grep is still extremely fast for most cases. If you're talking about short arrays (less than 1000 items) then most algorithms are going to be pretty fast anyway. If you're talking about very long arrays (1,000,000 items) grep is acceptably quick regardless of whether the item is the first or the middle or last in the array.

有人提到,grep遍历所有值,即使数组中的第一个值匹配。这是事实,但是grep在大多数情况下仍然非常快。如果你说的是短数组(少于1000个项目),那么大多数算法都会很快。如果您谈论的是非常长的数组(1,000,000项),那么grep的速度是可以接受的,无论该项是数组中的第一个还是中间的,还是最后一个。

Optimization Cases for longer arrays:

长数组的优化案例:

If your array is sorted, use a "binary search".

如果数组已排序,请使用“二进制搜索”。

If the same array is repeatedly searched many times, copy it into a hash first and then check the hash. If memory is a concern, then move each item from the array into the hash. More memory efficient but destroys the original array.

如果重复搜索同一数组多次,首先将其复制到散列中,然后检查散列。如果需要考虑内存,那么将数组中的每个项移动到散列中。提高内存效率,但破坏原始数组。

If same values are searched repeatedly within the array, lazily build a cache. (as each item is searched, first check if the search result was stored in a persisted hash. if the search result is not found in the hash, then search the array and put the result in the persisted hash so that next time we'll find it in the hash and skip the search).

如果在数组中重复搜索相同的值,则延迟构建缓存。(在搜索每个条目时,首先检查搜索结果是否存储在一个持久化散列中。如果在散列中没有找到搜索结果,那么搜索数组并将结果放在持久化散列中,以便下次在散列中找到结果并跳过搜索)。

Note: these optimizations will only be faster when dealing with long arrays. Don't over optimize.

注意:只有在处理长数组时,这些优化才会更快。不要过度优化。

#3


98  

You can use smartmatch feature in Perl 5.10 as follows:

您可以在Perl 5.10中使用smartmatch特性:

For literal value lookup doing below will do the trick.

对于字面值查找,执行以下操作将达到目的。

if ( "value" ~~ @array ) 

For scalar lookup, doing below will work as above.

对于标量查找,执行以下操作与上面一样。

if ($val ~~ @array)

For inline array doing below, will work as above.

对于下面进行的内联数组,将像上面一样工作。

if ( $var ~~ ['bar', 'value', 'foo'] ) 

In Perl 5.18 smartmatch is flagged as experimental therefore you need to turn off the warnings by turning on experimental pragma by adding below to your script/module:

在Perl 5.18 smartmatch中,它被标记为实验性的,因此您需要通过在脚本/模块中添加以下内容来打开实验性的pragma来关闭警告:

use experimental 'smartmatch';

Alternatively if you want to avoid the use of smartmatch - then as Aaron said use:

或者,如果你想避免使用smartmatch——那么正如Aaron所说:

if ( grep( /^$value$/, @array ) ) {
  #TODO:
}

#4


36  

This blog post discusses the best answers to this question.

这篇博文讨论了这个问题的最佳答案。

As a short summary, if you can install CPAN modules then the most readable solutions are:

作为一个简短的总结,如果您可以安装CPAN模块,那么最易读的解决方案是:

any(@ingredients) eq 'flour';

or

@ingredients->contains('flour');

However, a more common idiom is:

然而,一个更常见的习语是:

any { $_ eq 'flour' } @ingredients

But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the ~~ "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list.

但是请不要使用first()函数!它根本没有表达代码的意图。不要使用~~“智能火柴”操作员:它坏了。并且不要使用grep()或带有散列的解决方案:它们遍历整个列表。

any() will stop as soon as it finds your value.

任何()只要找到你的价值就会停止。

Check out the blog post for more details.

查看博客文章了解更多细节。

#5


11  

Even though it's convenient to use, it seems like the convert-to-hash solution costs quite a lot of performance, which was an issue for me.

尽管使用起来很方便,但似乎转换到哈希的解决方案花费了大量的性能,这对我来说是一个问题。

#!/usr/bin/perl
use Benchmark;
my @list;
for (1..10_000) {
    push @list, $_;
}

timethese(10000, {
  'grep'    => sub {
            if ( grep(/^5000$/o, @list) ) {
                # code
            }
        },
  'hash'    => sub {
            my %params = map { $_ => 1 } @list;
            if ( exists($params{5000}) ) {
                # code
            }
        },
});

Output of benchmark test:

输出的基准测试:

Benchmark: timing 10000 iterations of grep, hash...
          grep:  8 wallclock secs ( 7.95 usr +  0.00 sys =  7.95 CPU) @ 1257.86/s (n=10000)
          hash: 50 wallclock secs (49.68 usr +  0.01 sys = 49.69 CPU) @ 201.25/s (n=10000)

#6


10  

@eakssjo's benchmark is broken - measures creating hashes in loop vs creating regexes in loop. Fixed version (plus I've added List::Util::first and List::MoreUtils::any):

@eakssjo的基准被破坏了——度量在循环中创建散列,而在循环中创建正则表达式。固定版本(加上我添加了List::Util: first和List::MoreUtils::any):

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list;  # precompute hash
timethese(
    100_000, {
        'any' => sub {
            die unless ( any { $hit_regex } @list );
        },
        'first' => sub {
            die unless ( first { $hit_regex } @list );
        },
        'grep' => sub {
            die unless ( grep { $hit_regex } @list );
        },
        'hash' => sub {
            die unless ( $params{$hit} );
        },
    });

And result (it's for 100_000 iterations, ten times more than in @eakssjo's answer):

结果(10万次迭代,比@eakssjo的答案多10倍):

Benchmark: timing 100000 iterations of any, first, grep, hash...
       any:  0 wallclock secs ( 0.67 usr +  0.00 sys =  0.67 CPU) @ 149253.73/s (n=100000)
     first:  1 wallclock secs ( 0.63 usr +  0.01 sys =  0.64 CPU) @ 156250.00/s (n=100000)
      grep: 42 wallclock secs (41.95 usr +  0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
      hash:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)

#7


2  

You certainly want a hash here. Place the bad parameters as keys in the hash, then decide whether a particular parameter exists in the hash.

你肯定想要一个散列。将糟糕的参数作为键放在散列中,然后决定散列中是否存在特定的参数。

our %bad_params = map { $_ => 1 } qw(badparam1 badparam2 badparam3)

if ($bad_params{$new_param}) {
  print "That is a bad parameter\n";
}

If you are really interested in doing it with an array, look at List::Util or List::MoreUtils

如果你真的很想用一个数组来做,看看列表::Util或List::MoreUtils。

#8


2  

Method 1: grep (may careful while value is expected to be a regex).

Try to avoid using grep, if looking at resources.

如果查看参考资料,请尽量避免使用grep。

if ( grep( /^$value$/, @badparams ) ) {
  print "found";
}

Method 2: Linear Search

for (@badparams) {
    if ($_ eq $value) {
       print "found";
    }
}

Method 3: Use a hash

my %hash = map {$_ => 1} @badparams;
print "found" if (exists $hash{$value});

Method 4: smartmatch

(added in Perl 5.10, marked is experimental in Perl 5.18).

(添加在Perl 5.10中,标记在Perl 5.18中是实验性的)。

use experimental 'smartmatch';  # for perl 5.18
print "found" if ($value ~~ @badparams);

Method 5: Use core module List::MoreUtils

use List::MoreUtils qw(any uniq);;
@badparams = (1,2,3);
$value = 1;
print "found" if any {$_ eq $value} @badparams;

#9


0  

There are two ways you can do this. You can use the throw the values into a hash for a lookup table, as suggested by the other posts. ( I'll add just another idiom. )

有两种方法可以做到这一点。您可以像其他文章建议的那样,将值抛出到查找表的散列中。(我再加一个成语。)

my %bad_param_lookup;
@bad_param_lookup{ @bad_params } = ( 1 ) x @bad_params;

But if it's data of mostly word characters and not too many meta, you can dump it into a regex alternation:

但是,如果它的数据主要是单词字符,而不是太多的元数据,那么您可以将它转储到regex交替中:

use English qw<$LIST_SEPARATOR>;

my $regex_str = do { 
    local $LIST_SEPARATOR = '|';
    "(?:@bad_params)";
 };

 # $front_delim and $back_delim being any characters that come before and after. 
 my $regex = qr/$front_delim$regex_str$back_delim/;

This solution would have to be tuned for the types of "bad values" you're looking for. And again, it might be totally inappropriate for certain types of strings, so caveat emptor.

这个解决方案必须针对您正在寻找的“坏值”类型进行调优。再说一遍,它可能完全不适合某些类型的字符串,所以买者要小心。

#10


0  

@files is an existing array

@files是一个现有的数组

my @new_values =  grep(/^2[\d].[\d][A-za-z]?/,@files);

print join("\n", @new_values);

print "\n";

/^2[\d].[\d][A-za-z]?/ = vaues starting from 2 here you can put any regular expression

/ ^ 2 \[d]。[\ d][A-za-z]吗?/ = vaues,从2开始你可以写任何正则表达式

#11


-1  

my @badparams = (1,2,5,7,'a','zzz');

my $badparams = join('|',@badparams);   # '|' or any other character not present in params

foreach my $par (4,5,6,7,'a','z','zzz')
{
    if ($badparams =~ /\b$par\b/)
    {
        print "$par is present\n";
    }
    else
    {
        print "$par is not present\n";
    }
}

You may want to check for numerical leading spaces consistancy

您可能想要检查数值领先空间的一致性