在Perl中,有什么理由更喜欢glob而不是readdir(反之亦然)呢?

时间:2022-09-01 23:29:43

This question is a spin-off from this one. Some history: when I first learned Perl, I pretty much always used glob rather than opendir + readdir because I found it easier. Then later various posts and readings suggested that glob was bad, and so now I pretty much always use readdir.

这个问题是这个问题的副产品。一些历史:当我第一次学习Perl时,我经常使用glob而不是opendir + readdir,因为我发现它更容易。后来,各种各样的文章和阅读资料表明,glob是不好的,所以现在我经常使用readdir。

After thinking over this recent question I realized that my reasons for one or the other choice may be bunk. So, I'm going to lay out some pros and cons, and I'm hoping that more experienced Perl folks can chime in and clarify. The question in a nutshell is are there compelling reasons to prefer glob to readdir or readdir to glob (in some or all cases)?

在考虑了这个最近的问题之后,我意识到我选择一个或另一个选择的原因可能是胡扯。因此,我将列出一些优缺点,我希望更有经验的Perl人员可以加入并澄清。简而言之,问题是,是否有令人信服的理由来选择glob而不是readdir或readdir(在某些或所有情况下)?

glob pros:

  1. No dotfiles (unless you ask for them)
  2. 没有dotfiles(除非您要求它们)
  3. Order of items is guaranteed
  4. 保证项目的秩序。
  5. No need to prepend the directory name onto items manually
  6. 不需要手动将目录名添加到项目中。
  7. Better name (c'mon - glob versus readdir is no contest if we're judging by names alone)
  8. 更好的名字(c'mon - glob与readdir不是竞争,如果我们仅凭名字来判断)
  9. (From ysth's answer; cf. glob cons 4 below) Can return non-existent filenames:

    (从ysth的回答;cf. glob cons 4)可以返回不存在的文件名:

    @deck = glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
    

glob cons:

  1. Older versions are just plain broken (but 'older' means pre 5.6, I think, and frankly if you're using pre 5.6 Perl, you have bigger problems)
  2. 旧版本是简单的(但是“老”的意思是前5.6,我认为,坦白地说,如果你使用的是前5.6 Perl,你会遇到更大的问题)
  3. Calls stat each time (i.e., useless use of stat in most cases).
  4. 每次调用stat(即:,在大多数情况下是无用的使用。
  5. Problems with spaces in directory names (is this still true?)
  6. 目录名中的空格问题(这仍然正确吗?)
  7. (From brian's answer) Can return filenames that don't exist:

    (从brian的答案)可以返回不存在的文件名:

    $ perl -le 'print glob "{ab}{cd}"'
    

readdir pros:

  1. (From brian's answer) opendir returns a filehandle which you can pass around in your program (and reuse), but glob simply returns a list
  2. (来自brian的答案)opendir返回一个文件句柄,您可以在程序(和重用)中传递这个文件句柄,但是glob只返回一个列表。
  3. (From brian's answer) readdir is a proper iterator and provides functions to rewinddir, seekdir, telldir
  4. readdir是一个合适的迭代器,并向rewinddir、seekdir、telldir提供函数。
  5. Faster? (Pure guess based on some of glob's features from above. I'm not really worried about this level of optimization anyhow, but it's a theoretical pro.)
  6. 更快呢?(基于glob的一些特性,我们可以进行纯粹的猜测。)我并不是很担心这个水平的优化,但它是一个理论支持。
  7. Less prone to edge-case bugs than glob?
  8. 比glob更不容易出现边缘错误?
  9. Reads everything (dotfiles too) by default (this is also a con)
  10. 默认读取所有(dotfiles)(这也是一个con)
  11. May convince you not to name a file 0 (a con also - see Brad's answer)
  12. 可能说服您不要命名一个文件0(一个con,也可以看到Brad的回答)
  13. Anyone? Bueller? Bueller?
  14. 有人知道吗?春天吗?春天吗?

readdir cons:

  1. If you don't remember to prepend the directory name, you will get bit when you try to do filetests or copy items or edit items or...
  2. 如果您不记得预先输入目录名,那么当您尝试进行filetest或复制项或编辑项时,您将会得到一些信息。
  3. If you don't remember to grep out the . and .. items, you will get bit when you count items, or try to walk recursively down the file tree or...
  4. 如果你不记得用grep。和. .项目,当您计算项目时,您将会得到一些,或者尝试递归地沿着文件树或…
  5. Did I mention prepending the directory name? (A sidenote, but my very first post to the Perl Beginners mail list was the classic, "Why does this code involving filetests not work some of the time?" problem related to this gotcha. Apparently, I'm still bitter.)
  6. 我刚才提到过目录名吗?(一个sidenote,但是我对Perl初学者邮件列表的第一个帖子是经典的,“为什么这个涉及filetests的代码不工作一些时间呢?”问题与这个问题有关。显然,我还苦。)
  7. Items are returned in no particular order. This means you will often have to remember to sort them in some manner. (This could be a pro if it means more speed, and if it means that you actually think about how and if you need to sort items.) Edit: Horrifically small sample, but on a Mac readdir returns items in alphabetical order, case insensitive. On a Debian box and an OpenBSD server, the order is utterly random. I tested the Mac with Apple's built-in Perl (5.8.8) and my own compiled 5.10.1. The Debian box is 5.10.0, as is the OpenBSD machine. I wonder if this is a filesystem issue, rather than Perl?
  8. 返回的项没有特定的顺序。这意味着你将经常需要记住以某种方式对它们进行排序。(这可能是一个专业的,如果它意味着更多的速度,如果它意味着你真的想要如何和如果你需要分类项目。)编辑:非常小的示例,但是在Mac readdir上,按字母顺序返回项目,大小写不敏感。在Debian和OpenBSD服务器上,订单是完全随机的。我用苹果内置的Perl(5.8.8)和我自己编写的5.10.1测试了Mac。Debian的box是5.10.0,OpenBSD也是如此。我想知道这是不是一个文件系统问题,而不是Perl?
  9. Reads everything (dotfiles too) by default (this is also a pro)
  10. 默认读取所有(dotfiles)(这也是一个pro)
  11. Doesn't necessarily deal well with a file named 0 (see pros also - see Brad's answer)
  12. 不一定能很好地处理一个名为0的文件(请参阅专业人员——请参阅Brad的回答)

10 个解决方案

#1


42  

You missed the most important, biggest difference between them: glob gives you back a list, but opendir gives you a directory handle. You can pass that directory handle around to let other objects or subroutines use it. With the directory handle, the subroutine or object doesn't have to know anything about where it came from, who else is using it, and so on:

您错过了它们之间最重要的、最大的区别:glob为您提供了一个列表,但是opendir提供了一个目录句柄。您可以将该目录句柄传递给其他对象或子例程使用它。对于目录句柄,子例程或对象不需要知道它来自哪里,谁在使用它,等等:

 sub use_any_dir_handle {
      my( $dh ) = @_;
      rewinddir $dh;
      ...do some filtering...
      return \@files;
      }

With the dirhandle, you have a controllable iterator where you can move around with seekdir, although with glob you just get the next item.

有了dirhandle,您就有了一个可控的迭代器,您可以在这个迭代器中使用seekdir,尽管使用glob,您可以得到下一个项目。

As with anything though, the costs and benefits only make sense when applied to a certain context. They do not exist outside of a particular use. You have an excellent list of their differences, but I wouldn't classify those differences without knowing what you were trying to do with them.

但是,与任何事情一样,成本和收益在应用于特定的环境时才有意义。它们不存在于某一特定用途之外。你有一个很好的清单,但我不会把这些差异分类,而不知道你想要怎么处理它们。

Some other things to remember:

还有一些需要记住的事情:

  • You can implement your own glob with opendir, but not the other way around.

    您可以使用opendir实现自己的glob,但不是相反。

  • glob uses its own wildcard syntax, and that's all you get.

    glob使用它自己的通配符语法,这就是您所得到的。

  • glob can return filenames that don't exist:

    glob可以返回不存在的文件名:

    $ perl -le 'print glob "{ab}{cd}"'
    

#2


8  

glob pros: Can return 'filenames' that don't exist:

glob优点:可以返回不存在的“文件名”:

my @deck = List::Util::shuffle glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
while (my @hand = splice @deck,0,13) {
    say join ",", @hand;
}
__END__
6♥,8♠,7♠,Q♠,K♣,Q♦,A♣,3♦,6♦,5♥,10♣,Q♣,2♠
2♥,2♣,K♥,A♥,8♦,6♠,8♣,10♠,10♥,5♣,3♥,Q♥,K♦
5♠,5♦,J♣,J♥,J♦,9♠,2♦,8♥,9♣,4♥,10♦,6♣,3♠
3♣,A♦,K♠,4♦,7♣,4♣,A♠,4♠,7♥,J♠,9♥,7♦,9♦

#3


6  

Here is a disadvantage for opendir and readdir.

这里是opendir和readdir的一个缺点。

{
  open my $file, '>', 0;
  print {$file} 'Breaks while( readdir ){ ... }'
}
opendir my $dir, '.';

my $a = 0;
++$a for readdir $dir;
print $a, "\n";

rewinddir $dir;

my $b = 0;
++$b while readdir $dir;
print $b, "\n";

You would expect that code would print the same number twice, but it doesn't because there is a file with the name of 0. On my computer it prints 251, and 188, tested with Perl v5.10.0 and v5.10.1

您可能期望代码会打印两次相同的数字,但这并不是因为有一个文件名为0的文件。在我的计算机上,它打印了251和188,用Perl v5.10.0和v5.10.1进行了测试。

This problem also makes it so that this just prints out a bunch of empty lines, regardless of the existence of file 0:

这个问题也使得它只打印出一堆空行,不管文件0的存在:

use 5.10.0;
opendir my $dir, '.';

say while readdir $dir;

Where as this always works just fine:

这样做总是很好:

use 5.10.0;
my $a = 0;
++$a for glob '*';
say $a;

my $b = 0;
++$b while glob '*';
say $b;

say for glob '*';
say while glob '*';

I fixed these issues, and sent in a patch which made it into Perl v5.11.2, so this will work properly with Perl v5.12.0 when it comes out.

我修复了这些问题,并将其发送到Perl v5.11.2中,以便在Perl v5.12.0发布时能够正常工作。

My fix converts this:

我的解决转换:

while( readdir $dir ){ ... }

into this:

到这个:

while( defined( $_ = readdir $dir ){ ...}

Which makes it work the same way that read has worked on files. Actually it is the same bit of code, I just added another element to the corresponding if statements.

这使得它的工作方式与读取文件的方式相同。实际上它是相同的代码,我只是在相应的if语句中添加了另一个元素。

#4


5  

glob makes it convenient to read all the subdirectories of a given fixed depth, as in glob "*/*/*". I've found this handy in several occasions.

glob可以方便地读取给定固定深度的所有子目录,如glob“*/*/*”。我在好几次中都发现了这个方法。

#5


4  

Well, you pretty much cover it. All that taken into account, I would tend to use glob when I'm throwing together a quick one-off script and its behavior is just what I want, and use opendir and readdir in ongoing production code or libraries where I can take my time and clearer, cleaner code is helpful.

好吧,你几乎可以掩盖了。考虑到这一点,当我将一个快速的一次性脚本和它的行为放在一起时,我倾向于使用glob,并且使用opendir和readdir在正在进行的生产代码或库中,在那里我可以花费时间和更清晰的代码,更清晰的代码是有帮助的。

#6


3  

For small, simple things, I prefer glob. Just the other day, I used it and a twenty line perl script to retag a large portion of my music library. glob, however, has a pretty strange name. Glob? It's not intuitive at all, as far as a name goes.

对于一些简单的小事,我更喜欢glob。就在前几天,我用它和一个20行perl脚本重新标记了我的大部分音乐库。然而,glob的名字却很奇怪。一团?这一点也不直观,就像名字一样。

My biggest hangup with readdir is that it treats a directory in a way that's somewhat odd to most people. Usually, programmers don't think of a directory as a stream, they think of it as a resource, or list, which glob provides. The name is better, the functionality is better, but the interface still leaves something to be desired.

我对readdir的最大困扰是,它对目录的处理方式对大多数人来说有点奇怪。通常,程序员不认为目录是流,他们认为它是一种资源,或者是glob提供的列表。名称更好,功能更好,但是界面仍然有一些需要改进的地方。

#7


2  

That was a pretty comprehensive list. readdir (and readdir + grep) has less overhead than glob and so that is a plus for readdir if you need to analyze lots and lots of directories.

这是一个相当全面的列表。readdir(和readdir + grep)的开销比glob要少,因此,如果需要分析大量和大量的目录,那么这就是readdir的一个优点。

#8


2  

glob pros:

一团优点:

3) No need to prepend the directory name onto items manually

3)不需要手动将目录名添加到项目中。

Exception:

例外:

say for glob "*";

--output:--
1perl.pl
2perl.pl
2perl.pl.bak
3perl.pl
3perl.pl.bak
4perl.pl
data.txt
data1.txt
data2.txt
data2.txt.out

As far as I can tell, the rule for glob is: you must provide a full path to the directory to get full paths back. The Perl docs do not seem to mention that, and neither do any of the posts here.

就我所知,glob的规则是:您必须提供一个完整的路径到目录,以获得完整的路径。Perl文档似乎没有提到这一点,这里的任何文章也都没有提到。

That means that glob can be used in place of readdir when you want just filenames (rather than full paths), and you don't want hidden files returned, i.e. ones starting with '.'. For example,

这意味着,当您只想要文件名(而不是完整的路径)时,可以使用glob来代替readdir,并且您不希望返回隐藏的文件,即以“。”开头的文件。例如,

chdir ("../..");  
say for glob("*");

#9


2  

On a similar note, File::Slurp has a function called read_dir.

在类似的注释中,File::Slurp有一个名为read_dir的函数。

Since I use File::Slurp's other functions a lot in my scripts, read_dir has also become a habit.

因为我使用File::Slurp的其他函数在我的脚本中很多,read_dir也成为了一种习惯。

It also has following options: err_mode, prefix, and keep_dot_dot.

它还有以下选项:err_mode、前缀和keep_dot_dot。

#10


1  

First, do some reading. Chapter 9.6. of the Perl Cookbook outlines the point I want to get to nicely, just under the discussion heading.

首先,做一些阅读。9.6章。在Perl Cookbook的概述中,我想在讨论的标题下很好地说明这一点。

Secondly, do a search for glob and dosglob in your Perl directory. While many different sources (ways to get the file list) can be used, the reason why I point you to dosglob is that if you happen to be on a Windows platform (and using the dosglob solution), it is actually using opendir/readdir/closedir. Other versions use built-in shell commands or precompiled OS specific executables.

其次,在您的Perl目录中搜索glob和dosglob。虽然可以使用许多不同的来源(获取文件列表的方法),但我要告诉您dosglob的原因是,如果您碰巧在Windows平台上(并使用dosglob解决方案),那么它实际上是使用opendir/readdir/closedir。其他版本使用内置的shell命令或预编译的操作系统特定的可执行文件。

If you know you are targetting a specific platform, you can use this information to your advantage. Just for reference I looked into this on Strawberry Perl Portable edition 5.12.2, so things may be slightly different on newer or original versions of Perl.

如果你知道你正在寻找一个特定的平台,你可以利用这个信息来获取你的优势。只是为了参考我在草莓Perl便携式版本5.12.2上的研究,所以在Perl的更新版本或原始版本上,情况可能略有不同。

#1


42  

You missed the most important, biggest difference between them: glob gives you back a list, but opendir gives you a directory handle. You can pass that directory handle around to let other objects or subroutines use it. With the directory handle, the subroutine or object doesn't have to know anything about where it came from, who else is using it, and so on:

您错过了它们之间最重要的、最大的区别:glob为您提供了一个列表,但是opendir提供了一个目录句柄。您可以将该目录句柄传递给其他对象或子例程使用它。对于目录句柄,子例程或对象不需要知道它来自哪里,谁在使用它,等等:

 sub use_any_dir_handle {
      my( $dh ) = @_;
      rewinddir $dh;
      ...do some filtering...
      return \@files;
      }

With the dirhandle, you have a controllable iterator where you can move around with seekdir, although with glob you just get the next item.

有了dirhandle,您就有了一个可控的迭代器,您可以在这个迭代器中使用seekdir,尽管使用glob,您可以得到下一个项目。

As with anything though, the costs and benefits only make sense when applied to a certain context. They do not exist outside of a particular use. You have an excellent list of their differences, but I wouldn't classify those differences without knowing what you were trying to do with them.

但是,与任何事情一样,成本和收益在应用于特定的环境时才有意义。它们不存在于某一特定用途之外。你有一个很好的清单,但我不会把这些差异分类,而不知道你想要怎么处理它们。

Some other things to remember:

还有一些需要记住的事情:

  • You can implement your own glob with opendir, but not the other way around.

    您可以使用opendir实现自己的glob,但不是相反。

  • glob uses its own wildcard syntax, and that's all you get.

    glob使用它自己的通配符语法,这就是您所得到的。

  • glob can return filenames that don't exist:

    glob可以返回不存在的文件名:

    $ perl -le 'print glob "{ab}{cd}"'
    

#2


8  

glob pros: Can return 'filenames' that don't exist:

glob优点:可以返回不存在的“文件名”:

my @deck = List::Util::shuffle glob "{A,K,Q,J,10,9,8,7,6,5,4,3,2}{\x{2660},\x{2665},\x{2666},\x{2663}}";
while (my @hand = splice @deck,0,13) {
    say join ",", @hand;
}
__END__
6♥,8♠,7♠,Q♠,K♣,Q♦,A♣,3♦,6♦,5♥,10♣,Q♣,2♠
2♥,2♣,K♥,A♥,8♦,6♠,8♣,10♠,10♥,5♣,3♥,Q♥,K♦
5♠,5♦,J♣,J♥,J♦,9♠,2♦,8♥,9♣,4♥,10♦,6♣,3♠
3♣,A♦,K♠,4♦,7♣,4♣,A♠,4♠,7♥,J♠,9♥,7♦,9♦

#3


6  

Here is a disadvantage for opendir and readdir.

这里是opendir和readdir的一个缺点。

{
  open my $file, '>', 0;
  print {$file} 'Breaks while( readdir ){ ... }'
}
opendir my $dir, '.';

my $a = 0;
++$a for readdir $dir;
print $a, "\n";

rewinddir $dir;

my $b = 0;
++$b while readdir $dir;
print $b, "\n";

You would expect that code would print the same number twice, but it doesn't because there is a file with the name of 0. On my computer it prints 251, and 188, tested with Perl v5.10.0 and v5.10.1

您可能期望代码会打印两次相同的数字,但这并不是因为有一个文件名为0的文件。在我的计算机上,它打印了251和188,用Perl v5.10.0和v5.10.1进行了测试。

This problem also makes it so that this just prints out a bunch of empty lines, regardless of the existence of file 0:

这个问题也使得它只打印出一堆空行,不管文件0的存在:

use 5.10.0;
opendir my $dir, '.';

say while readdir $dir;

Where as this always works just fine:

这样做总是很好:

use 5.10.0;
my $a = 0;
++$a for glob '*';
say $a;

my $b = 0;
++$b while glob '*';
say $b;

say for glob '*';
say while glob '*';

I fixed these issues, and sent in a patch which made it into Perl v5.11.2, so this will work properly with Perl v5.12.0 when it comes out.

我修复了这些问题,并将其发送到Perl v5.11.2中,以便在Perl v5.12.0发布时能够正常工作。

My fix converts this:

我的解决转换:

while( readdir $dir ){ ... }

into this:

到这个:

while( defined( $_ = readdir $dir ){ ...}

Which makes it work the same way that read has worked on files. Actually it is the same bit of code, I just added another element to the corresponding if statements.

这使得它的工作方式与读取文件的方式相同。实际上它是相同的代码,我只是在相应的if语句中添加了另一个元素。

#4


5  

glob makes it convenient to read all the subdirectories of a given fixed depth, as in glob "*/*/*". I've found this handy in several occasions.

glob可以方便地读取给定固定深度的所有子目录,如glob“*/*/*”。我在好几次中都发现了这个方法。

#5


4  

Well, you pretty much cover it. All that taken into account, I would tend to use glob when I'm throwing together a quick one-off script and its behavior is just what I want, and use opendir and readdir in ongoing production code or libraries where I can take my time and clearer, cleaner code is helpful.

好吧,你几乎可以掩盖了。考虑到这一点,当我将一个快速的一次性脚本和它的行为放在一起时,我倾向于使用glob,并且使用opendir和readdir在正在进行的生产代码或库中,在那里我可以花费时间和更清晰的代码,更清晰的代码是有帮助的。

#6


3  

For small, simple things, I prefer glob. Just the other day, I used it and a twenty line perl script to retag a large portion of my music library. glob, however, has a pretty strange name. Glob? It's not intuitive at all, as far as a name goes.

对于一些简单的小事,我更喜欢glob。就在前几天,我用它和一个20行perl脚本重新标记了我的大部分音乐库。然而,glob的名字却很奇怪。一团?这一点也不直观,就像名字一样。

My biggest hangup with readdir is that it treats a directory in a way that's somewhat odd to most people. Usually, programmers don't think of a directory as a stream, they think of it as a resource, or list, which glob provides. The name is better, the functionality is better, but the interface still leaves something to be desired.

我对readdir的最大困扰是,它对目录的处理方式对大多数人来说有点奇怪。通常,程序员不认为目录是流,他们认为它是一种资源,或者是glob提供的列表。名称更好,功能更好,但是界面仍然有一些需要改进的地方。

#7


2  

That was a pretty comprehensive list. readdir (and readdir + grep) has less overhead than glob and so that is a plus for readdir if you need to analyze lots and lots of directories.

这是一个相当全面的列表。readdir(和readdir + grep)的开销比glob要少,因此,如果需要分析大量和大量的目录,那么这就是readdir的一个优点。

#8


2  

glob pros:

一团优点:

3) No need to prepend the directory name onto items manually

3)不需要手动将目录名添加到项目中。

Exception:

例外:

say for glob "*";

--output:--
1perl.pl
2perl.pl
2perl.pl.bak
3perl.pl
3perl.pl.bak
4perl.pl
data.txt
data1.txt
data2.txt
data2.txt.out

As far as I can tell, the rule for glob is: you must provide a full path to the directory to get full paths back. The Perl docs do not seem to mention that, and neither do any of the posts here.

就我所知,glob的规则是:您必须提供一个完整的路径到目录,以获得完整的路径。Perl文档似乎没有提到这一点,这里的任何文章也都没有提到。

That means that glob can be used in place of readdir when you want just filenames (rather than full paths), and you don't want hidden files returned, i.e. ones starting with '.'. For example,

这意味着,当您只想要文件名(而不是完整的路径)时,可以使用glob来代替readdir,并且您不希望返回隐藏的文件,即以“。”开头的文件。例如,

chdir ("../..");  
say for glob("*");

#9


2  

On a similar note, File::Slurp has a function called read_dir.

在类似的注释中,File::Slurp有一个名为read_dir的函数。

Since I use File::Slurp's other functions a lot in my scripts, read_dir has also become a habit.

因为我使用File::Slurp的其他函数在我的脚本中很多,read_dir也成为了一种习惯。

It also has following options: err_mode, prefix, and keep_dot_dot.

它还有以下选项:err_mode、前缀和keep_dot_dot。

#10


1  

First, do some reading. Chapter 9.6. of the Perl Cookbook outlines the point I want to get to nicely, just under the discussion heading.

首先,做一些阅读。9.6章。在Perl Cookbook的概述中,我想在讨论的标题下很好地说明这一点。

Secondly, do a search for glob and dosglob in your Perl directory. While many different sources (ways to get the file list) can be used, the reason why I point you to dosglob is that if you happen to be on a Windows platform (and using the dosglob solution), it is actually using opendir/readdir/closedir. Other versions use built-in shell commands or precompiled OS specific executables.

其次,在您的Perl目录中搜索glob和dosglob。虽然可以使用许多不同的来源(获取文件列表的方法),但我要告诉您dosglob的原因是,如果您碰巧在Windows平台上(并使用dosglob解决方案),那么它实际上是使用opendir/readdir/closedir。其他版本使用内置的shell命令或预编译的操作系统特定的可执行文件。

If you know you are targetting a specific platform, you can use this information to your advantage. Just for reference I looked into this on Strawberry Perl Portable edition 5.12.2, so things may be slightly different on newer or original versions of Perl.

如果你知道你正在寻找一个特定的平台,你可以利用这个信息来获取你的优势。只是为了参考我在草莓Perl便携式版本5.12.2上的研究,所以在Perl的更新版本或原始版本上,情况可能略有不同。