从命令行将多个文件中的列提取到单个输出文件中

Say I have a tab-delimited data file with 10 columns. With awk, it's easy to extract column 7, for example, and output that into a separate file. (See this question, for example.)

假设我有一个带有10列的制表符分隔数据文件。使用awk,例如,很容易提取第7列,并将其输出到单独的文件中。 (例如,请参阅此问题。)

What if I have 5 such data files, and I would like to extract column 7 from each of them and make a new file with 5 data columns, one for the column 7 of each input file? Can this be done from the command line with awk and other commands?

如果我有5个这样的数据文件,并且我想从每个数据文件中提取第7列并创建一个包含5个数据列的新文件,每个输入文件的第7列,该怎么办?可以使用awk和其他命令从命令行完成吗?

Or should I just write up a Python script to handle it?

或者我应该写一个Python脚本来处理它?

2 个解决方案

#1

awk '{a[FNR] = a[FNR]" " $7}END{for(i=0;i<FNR;i++) print a[i]}'

awk'{a [FNR] = a [FNR]“”$ 7} END {for(i = 0; i ;>

a array holds each line from different files

数组包含来自不同文件的每一行

FNR number of records read in current input file, set to zero at begining of each file.

FNR当前输入文件中读取的记录数,在每个文件开头设置为零。

END{for(i=0;i<FNR;i++) print a[i]} prints the content of array a on END of file

END {for(i = 0; i ;>

#2

If the data is small enough to store it all in memory then this should work:

如果数据足够小以将其全部存储在内存中,那么这应该工作:

awk '{out[FNR]=out[FNR] (out[FNR]?OFS:"") $7; max=(FNR>max)?FNR:max} END {for (i=1; i<=max; i++) {print out[i]}}' file1 file2 file3 file4 file5

If it isn't then you would need something fancier which could seek around file streams or read single lines from multiple files (a shell loop with N calls to read could do this).

如果不是那么你需要更高级的东西来寻找文件流或从多个文件中读取单行(一个带有N次调用的shell循环可以做到这一点)。

#1