快速递归搜索并通过bash/sed/awk替换大量的文件——这是可能的吗?

时间:2022-01-08 09:00:55

I'm given a directory with sub directories and about 300000 different kinds of text files in there. All related to some production project, changing its architecture isn't an option.

我有一个目录,里面有子目录和大约30万种不同的文本文件。所有这些都与一些生产项目有关,更改其体系结构不是一个选择。

Some tasks require replacing specific strings everywhere they occur. Using grep and sed takes about 5 minutes for every such a replace. Using find and sed takes a lot more time...

有些任务需要替换出现的特定字符串。使用grep和sed每次替换大约需要5分钟。使用find和sed需要更多的时间……

However, PhpStorm takes some time to index all the files while opening this directory, but after that searching and replacing in all the files with PhpStorm is blazing fast!

然而,在打开这个目录时,PhpStorm需要一些时间来索引所有文件,但是在所有文件中搜索和替换之后,PhpStorm的速度很快!

Is it possible to achieve a similar behaviour remaining in terminal emulator? To index somehow all files in a given directory for a fast search&replace after that?

是否有可能在终端模拟器中实现类似的行为?以某种方式索引给定目录中的所有文件,以便在此之后进行快速搜索和替换?

Trying to google around I found some tools like cscope, idutils, seascope, but as far as I could check there are serious limitations like search only without an obvious way to replace, or indexing only source files for functions, keywords, etc...

在尝试谷歌时,我发现了一些工具,比如cscope, iowe, seascope,但是就我所能检查到的而言,有一些严重的限制,比如搜索没有明显的替代方法,或者只对函数、关键字等源文件进行索引……

What I'm looking for is a way to index all the files for fast search&replace with auto updated index. Like in PhpStorm but terminal way and open source.

我所寻找的是一种索引所有文件以快速搜索和替换自动更新索引的方法。就像在PhpStorm但是终端方式和开源。

Thanks!

谢谢!

1 个解决方案

#1


2  

How about this:

这个怎么样:

find <base directory> -type f -exec sed -i \
  -e 's/<pattern1>/<replacement1>/' \
  -e 's/<pattern2>/<replacement2>/' \
  ...
  -e 's/<patternN>/<replacementN>/' \
  {} ';'

The key there is to specify all the replacements you want to do at the same time, so that you only need one pass over the file set. If most files will need at least one replacement, then I can't see how you could do much better than that.

这里的关键是指定要同时进行的所有替换,这样您只需要对文件集进行一次传递。

If only a few files need replacements, then you could instead do

如果只有少数文件需要替换,那么您可以这样做

grep -R --files-with-matches '<pattern1>\|<pattern2>\|...<patternN>' <base directory> \
  | xargs sed -i \
  -e 's/<pattern1>/<replacement1>/' \
  -e 's/<pattern2>/<replacement2>/' \
  ...
  -e 's/<patternN>/<replacementN>/'

Again, the key is to do all the replacements in one pass through the file list, but this version uses grep to pre-test each file for whether it needs any replacements. Pre-testing is faster than processing the whole thing with sed when there are no replacements to be made, but you have to run the file through sed anyway when replacements do need to be made.

同样,关键是要在一个遍历文件列表中完成所有的替换,但是这个版本使用grep来预先测试每个文件是否需要替换。当不需要进行替换时,预测试比使用sed处理整个文件要快,但是当需要进行替换时,您必须通过sed运行文件。

Anything fancier is likely to take you more time to make than you will end up saving.

任何花在你身上的时间都比你存起来的时间要多。

Do note that generic tools such as grep and sed probably will not work well for you if you need to be smart about which text to replace, such as avoiding replacements in quoted strings. If you need something like that then you really should use tools that understand the format of the files.

请注意,如果您需要明智地选择要替换的文本,例如避免在引用字符串中替换文本,那么grep和sed之类的通用工具可能不会很适合您。如果您需要这样的东西,那么您确实应该使用能够理解文件格式的工具。

#1


2  

How about this:

这个怎么样:

find <base directory> -type f -exec sed -i \
  -e 's/<pattern1>/<replacement1>/' \
  -e 's/<pattern2>/<replacement2>/' \
  ...
  -e 's/<patternN>/<replacementN>/' \
  {} ';'

The key there is to specify all the replacements you want to do at the same time, so that you only need one pass over the file set. If most files will need at least one replacement, then I can't see how you could do much better than that.

这里的关键是指定要同时进行的所有替换,这样您只需要对文件集进行一次传递。

If only a few files need replacements, then you could instead do

如果只有少数文件需要替换,那么您可以这样做

grep -R --files-with-matches '<pattern1>\|<pattern2>\|...<patternN>' <base directory> \
  | xargs sed -i \
  -e 's/<pattern1>/<replacement1>/' \
  -e 's/<pattern2>/<replacement2>/' \
  ...
  -e 's/<patternN>/<replacementN>/'

Again, the key is to do all the replacements in one pass through the file list, but this version uses grep to pre-test each file for whether it needs any replacements. Pre-testing is faster than processing the whole thing with sed when there are no replacements to be made, but you have to run the file through sed anyway when replacements do need to be made.

同样,关键是要在一个遍历文件列表中完成所有的替换,但是这个版本使用grep来预先测试每个文件是否需要替换。当不需要进行替换时,预测试比使用sed处理整个文件要快,但是当需要进行替换时,您必须通过sed运行文件。

Anything fancier is likely to take you more time to make than you will end up saving.

任何花在你身上的时间都比你存起来的时间要多。

Do note that generic tools such as grep and sed probably will not work well for you if you need to be smart about which text to replace, such as avoiding replacements in quoted strings. If you need something like that then you really should use tools that understand the format of the files.

请注意,如果您需要明智地选择要替换的文本,例如避免在引用字符串中替换文本,那么grep和sed之类的通用工具可能不会很适合您。如果您需要这样的东西,那么您确实应该使用能够理解文件格式的工具。