在成千上万的文件中搜索并替换数百个字符串?

时间:2022-09-01 22:47:46

I am looking into changing the file name of hundreds of files in a (C/C++) project that I work on. The problem is our software has tens of thousands of files that including (i.e. #include) these hundreds of files that will get changed. This looks like a maintenance nightmare. If I do this I will be stuck in Ultra-Edit for weeks, rolling hundreds of regex's by hand like so:

我正在研究在我工作的(C / C ++)项目中更改数百个文件的文件名。问题是我们的软件有成千上万的文件,其中包括(即#include)这些数百个将被更改的文件。这看起来像是一场维护噩梦。如果我这样做,我将被困在Ultra-Edit中数周,手动滚动数百个正则表达式如下:

^\#include.*["<\\/]stupid_name.*$

with

#include <dir/new_name.h>

Such drudgery would be worse than peeling hundreds of potatoes in a sunken submarine in the antarctic with a spoon. I think it would rather be ideal to put the inputs and outputs into a table like so:

这种苦差事会比用勺子在南极的沉没潜艇上剥掉数百个土豆更糟糕。我认为将输入和输出放入如下表中是理想的:

stupid_name.h <-> <dir/new_name.h>
stupid_nameb.h <-> <dir/new_nameb.h>
stupid_namec.h <-> <dir/new_namec.h>

and feed this into a regular expression engine / tool / app / etc...

并将其提供给正则表达式引擎/工具/ app /等...

My Ultimate Question: Is there a tool that will do that?

我的终极问题:是否有工具可以做到这一点?

Bonus Question: Is it multi-threaded?

奖金问题:它是多线程的吗?

I looked at quite a few search and replace topics here on this website, and found lots of standard queries that asked a variant of the following question:

我在这个网站上查看了很多搜索和替换主题,发现了许多标准查询,询问了以下问题的变体:

standard question: Replace one term in N files.

标准问题:在N个文件中替换一个术语。

as opposed to:

而不是:

my question: Replace N terms in N files.

我的问题:替换N个文件中的N个术语。

Thanks in advance for any replies.

提前感谢您的回复。

7 个解决方案

#1


1  

As Mark Wilkins says, this is a workable plan with whatever regex-handy scripting tool you prefer, but I'd suggest a couple of additional points:

正如Mark Wilkins所说,这是一个可行的计划,你喜欢任何正则表达式的脚本编写工具,但我建议另外几点:

  1. Use two scripts: one to translate your list into a regexes, and another to apply them. Trying to do both jobs in one script is asking for trouble.
  2. 使用两个脚本:一个用于将列表转换为正则表达式,另一个用于应用它们。尝试在一个脚本中执行这两项工作都会遇到麻烦。

  3. Don't forget to change the #include directives and rename the header files at the same time.
  4. 不要忘记更改#include指令并同时重命名头文件。

  5. If you know how to change one thing in N files, then, heck, you can just loop over the K things you want to change. It's not the most efficient way in terms of processor time, but that's not the bottleneck here.
  6. 如果你知道如何改变N个文件中的一个东西,那么,你可以绕过你要改变的K个东西。就处理器时间而言,这不是最有效的方式,但这不是瓶颈。

  7. This approach will work in theory, but if it works in practice on the first try then your code base is cleaner than anything (that size) I've ever seen. There will almost certainly be little surprises: a hard-coded path that doesn't match the regex, a bad name that collides with a good name, some other glitch nobody would have thought of. I suggest starting small, with one or two pairs of names, compiling after every replacement, and retreating in case of trouble. If you do this right you can set it up to run overnight and in the morning you'll have a working code base that's almost done, and a list of the names that caused trouble and need human attention.
  8. 这种方法在理论上是有效的,但是如果它在第一次尝试时在实践中工作那么你的代码库比我见过的任何东西(那个大小)更清晰。几乎肯定会有一些惊喜:一个与正则表达式不匹配的硬编码路径,一个与好名字相撞的坏名称,一些其他人无法想到的故障。我建议从一对或两对名字开始小,每次更换后编译,并在遇到麻烦时撤退。如果你这样做,你可以把它设置为一夜之间运行,早上你将拥有一个几乎已经完成的工作代码库,以及一个引起麻烦并需要人工关注的名称列表。

#2


2  

I would use awk, a command line tool similar to sed.

我会使用awk,一个类似于sed的命令行工具。

mv file.x file.x.bak;
awk '{
  gsub( "#include \"bad_one.h\"" , "#include \"good_one.h\"" );
  gsub( "#include \"bad_two.h\"" , "#include \"good_two.h\"" );
}' file.x.bak > file.x;

Once you are at a terminal, use man awk to see more details.

到达终端后,使用man awk查看更多详细信息。

#3


1  

I think your idea of putting the old/new names into a single location is a good one. It would certainly reduce the difficulty of maintaining and verifying the changes. It seems like this is the obvious answer, but I think that using any of the popular scripting languages such as ruby, python, perl, etc. would make this task fairly straightforward. The script could read in the file that has the old/new replacement information, construct the appropriate regular expressions from that, and then process the files that need the replacements.

我认为您将旧/新名称放在一个位置的想法很好。它肯定会减少维护和验证变更的难度。这似乎是明显的答案,但我认为使用任何流行的脚本语言,如ruby,python,perl等,会使这项任务相当简单。该脚本可以读取具有旧/新替换信息的文件,从中构造适当的正则表达式,然后处理需要替换的文件。

The script could be written as a multi-threaded utility, although it doesn't seem like there would be a lot of benefit in this type of situation. If I understand the question, this should be basically a one-time usage so high performance does not seem like the top priority.

该脚本可以编写为多线程实用程序,尽管在这种情况下似乎不会有很多好处。如果我理解这个问题,这应该基本上是一次性使用,所以高性能似乎不是首要任务。

#4


1  

Make a series of perl one-liners to edit the files in place, like so:

制作一系列perl单行来编辑文件,如下所示:

perl -i.bak -p -e 's/stupid_old_name/cool_new_name/' *.c

This has the added bonus of saving the originals of any changed files with a .bak extension.

这还有额外的好处,即使用.bak扩展名保存任何已更改文件的原件。

I'd make a bunch of these, if I didn't know perl that well. I'd even put all the one-liners into a shell script, but then I'm not trying to impress any of the unix graybeards out there.

如果我不熟悉perl,我会做出一些这样的。我甚至将所有单行内容放入shell脚本中,但是我并没有试图打动任何unix灰色标记。

This website explains edit in place with perl very well: http://www.rice.edu/web/perl-edit.html

这个网站很好地解释了perl的编辑:http://www.rice.edu/web/perl-edit.html

PS - Since I do know perl fairly well, I'd just write the was/is table in a "real" perl script and use it to open and parse all the files.

PS - 因为我非常了解perl,所以我只是在一个“真正的”perl脚本中编写was / is表,并使用它打开并解析所有文件。

#5


0  

Will this (Wingrep) do the trick?

这(Wingrep)会不会这样做?

#6


0  

PowerGREP can do that. It can search for multiple search strings (literal text or regular expressions) in any combination of files, and is multithreaded (starting with PowerGREP 4, the current version).

PowerGREP可以做到这一点。它可以在任何文件组合中搜索多个搜索字符串(文字文本或正则表达式),并且是多线程的(从PowerGREP 4开始,当前版本)。

alt text http://img682.imageshack.us/img682/5172/screen006c.png

替代文字http://img682.imageshack.us/img682/5172/screen006c.png

You can save your searches for later re-use, too.

您也可以保存搜索以供以后重复使用。

#7


0  

in *nix, (or GNU win32) , you can use GNU find and sed together... eg

在* nix,(或GNU win32)中,您可以使用GNU find和sed ...例如

find /path -type f -name "*.c" -exec  sed -i.bak 's/^\#include.*["<\\/]stupid_name.*$/#include <dir\/new_name.h>/' "{}" +;

explanation,

the find command starts finding files (-type f) starting from /path. -name "*.c" searches for all .c files, then for each one found, do a sed to change the string to the new string. -i.bak asks sed to save the original file as backup before doing inplace editing. "{}" means the file passed to sed

find命令从/ path开始查找文件(-type f)。 -name“* .c”搜索所有.c文件,然后对于找到的每个文件,执行sed将字符串更改为新字符串。 -i.bak要求sed在进行就地编辑之前将原始文件保存为备份。 “{}”表示传递给sed的文件

#1


1  

As Mark Wilkins says, this is a workable plan with whatever regex-handy scripting tool you prefer, but I'd suggest a couple of additional points:

正如Mark Wilkins所说,这是一个可行的计划,你喜欢任何正则表达式的脚本编写工具,但我建议另外几点:

  1. Use two scripts: one to translate your list into a regexes, and another to apply them. Trying to do both jobs in one script is asking for trouble.
  2. 使用两个脚本:一个用于将列表转换为正则表达式,另一个用于应用它们。尝试在一个脚本中执行这两项工作都会遇到麻烦。

  3. Don't forget to change the #include directives and rename the header files at the same time.
  4. 不要忘记更改#include指令并同时重命名头文件。

  5. If you know how to change one thing in N files, then, heck, you can just loop over the K things you want to change. It's not the most efficient way in terms of processor time, but that's not the bottleneck here.
  6. 如果你知道如何改变N个文件中的一个东西,那么,你可以绕过你要改变的K个东西。就处理器时间而言,这不是最有效的方式,但这不是瓶颈。

  7. This approach will work in theory, but if it works in practice on the first try then your code base is cleaner than anything (that size) I've ever seen. There will almost certainly be little surprises: a hard-coded path that doesn't match the regex, a bad name that collides with a good name, some other glitch nobody would have thought of. I suggest starting small, with one or two pairs of names, compiling after every replacement, and retreating in case of trouble. If you do this right you can set it up to run overnight and in the morning you'll have a working code base that's almost done, and a list of the names that caused trouble and need human attention.
  8. 这种方法在理论上是有效的,但是如果它在第一次尝试时在实践中工作那么你的代码库比我见过的任何东西(那个大小)更清晰。几乎肯定会有一些惊喜:一个与正则表达式不匹配的硬编码路径,一个与好名字相撞的坏名称,一些其他人无法想到的故障。我建议从一对或两对名字开始小,每次更换后编译,并在遇到麻烦时撤退。如果你这样做,你可以把它设置为一夜之间运行,早上你将拥有一个几乎已经完成的工作代码库,以及一个引起麻烦并需要人工关注的名称列表。

#2


2  

I would use awk, a command line tool similar to sed.

我会使用awk,一个类似于sed的命令行工具。

mv file.x file.x.bak;
awk '{
  gsub( "#include \"bad_one.h\"" , "#include \"good_one.h\"" );
  gsub( "#include \"bad_two.h\"" , "#include \"good_two.h\"" );
}' file.x.bak > file.x;

Once you are at a terminal, use man awk to see more details.

到达终端后,使用man awk查看更多详细信息。

#3


1  

I think your idea of putting the old/new names into a single location is a good one. It would certainly reduce the difficulty of maintaining and verifying the changes. It seems like this is the obvious answer, but I think that using any of the popular scripting languages such as ruby, python, perl, etc. would make this task fairly straightforward. The script could read in the file that has the old/new replacement information, construct the appropriate regular expressions from that, and then process the files that need the replacements.

我认为您将旧/新名称放在一个位置的想法很好。它肯定会减少维护和验证变更的难度。这似乎是明显的答案,但我认为使用任何流行的脚本语言,如ruby,python,perl等,会使这项任务相当简单。该脚本可以读取具有旧/新替换信息的文件,从中构造适当的正则表达式,然后处理需要替换的文件。

The script could be written as a multi-threaded utility, although it doesn't seem like there would be a lot of benefit in this type of situation. If I understand the question, this should be basically a one-time usage so high performance does not seem like the top priority.

该脚本可以编写为多线程实用程序,尽管在这种情况下似乎不会有很多好处。如果我理解这个问题,这应该基本上是一次性使用,所以高性能似乎不是首要任务。

#4


1  

Make a series of perl one-liners to edit the files in place, like so:

制作一系列perl单行来编辑文件,如下所示:

perl -i.bak -p -e 's/stupid_old_name/cool_new_name/' *.c

This has the added bonus of saving the originals of any changed files with a .bak extension.

这还有额外的好处,即使用.bak扩展名保存任何已更改文件的原件。

I'd make a bunch of these, if I didn't know perl that well. I'd even put all the one-liners into a shell script, but then I'm not trying to impress any of the unix graybeards out there.

如果我不熟悉perl,我会做出一些这样的。我甚至将所有单行内容放入shell脚本中,但是我并没有试图打动任何unix灰色标记。

This website explains edit in place with perl very well: http://www.rice.edu/web/perl-edit.html

这个网站很好地解释了perl的编辑:http://www.rice.edu/web/perl-edit.html

PS - Since I do know perl fairly well, I'd just write the was/is table in a "real" perl script and use it to open and parse all the files.

PS - 因为我非常了解perl,所以我只是在一个“真正的”perl脚本中编写was / is表,并使用它打开并解析所有文件。

#5


0  

Will this (Wingrep) do the trick?

这(Wingrep)会不会这样做?

#6


0  

PowerGREP can do that. It can search for multiple search strings (literal text or regular expressions) in any combination of files, and is multithreaded (starting with PowerGREP 4, the current version).

PowerGREP可以做到这一点。它可以在任何文件组合中搜索多个搜索字符串(文字文本或正则表达式),并且是多线程的(从PowerGREP 4开始,当前版本)。

alt text http://img682.imageshack.us/img682/5172/screen006c.png

替代文字http://img682.imageshack.us/img682/5172/screen006c.png

You can save your searches for later re-use, too.

您也可以保存搜索以供以后重复使用。

#7


0  

in *nix, (or GNU win32) , you can use GNU find and sed together... eg

在* nix,(或GNU win32)中,您可以使用GNU find和sed ...例如

find /path -type f -name "*.c" -exec  sed -i.bak 's/^\#include.*["<\\/]stupid_name.*$/#include <dir\/new_name.h>/' "{}" +;

explanation,

the find command starts finding files (-type f) starting from /path. -name "*.c" searches for all .c files, then for each one found, do a sed to change the string to the new string. -i.bak asks sed to save the original file as backup before doing inplace editing. "{}" means the file passed to sed

find命令从/ path开始查找文件(-type f)。 -name“* .c”搜索所有.c文件,然后对于找到的每个文件,执行sed将字符串更改为新字符串。 -i.bak要求sed在进行就地编辑之前将原始文件保存为备份。 “{}”表示传递给sed的文件