如何编写sed脚本以从文本文件中grep信息

I'm trying to do my homework that is restricted to only using sed to filter an input file to a certain format of output. Here is the input file (named stocks):

我正在尝试做我的作业,仅限于使用sed将输入文件过滤到某种格式的输出。这是输入文件(名为stocks):

Symbol;Name;Volume
================================================

BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453

================================================

And the output needs to be:

输出需要是:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

I did come up with a solution, but it's not efficient. Here is my sed script (named try.sed):

我确实提出了一个解决方案,但效率不高。这是我的sed脚本(名为try.sed):

/.*;.*;[0-9].*/ { N
N
N
N
N
N
s/\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*\n\(.*\);.*;.*/\1, \2, \3, \4, \5, \6, \7/gp
}

The command that I run on shell is:

我在shell上运行的命令是:

$ sed -nf try.sed stocks

My question is, is there a better way of using sed to get the same result? The script I wrote only works with 7 lines of data. If the data is longer, I need to re-modify my script. I'm not sure how I can make it any better, so I'm here asking for help!

我的问题是,有没有更好的方法来使用sed来获得相同的结果?我写的脚本只能使用7行数据。如果数据较长,我需要重新修改我的脚本。我不确定我怎么能做得更好,所以我在这里寻求帮助!

Thanks for any recommendations.

谢谢你的任何建议。

4 个解决方案

#1

One more way using sed:

使用sed的另一种方法:

sed -ne '/^====/,/^====/ { /;/ { s/;.*$// ; H } }; $ { g ; s/\n// ; s/\n/, /g ; p }' stocks

Output:

BAC, CSCO, INTC, MSFT, VZ, KO, MMM

Explanation:

-ne               # Process each input line without printing and execute next commands...
/^====/,/^====/   # For all lines between these...
{
  /;/             # If line has a semicolon...
  { 
    s/;.*$//      # Remove characters from first semicolon until end of line.
    H             # Append content to 'hold space'.
  }
};
$                 # In last input line...
{
  g               # Copy content of 'hold space' to 'pattern space' to work with it.
  s/\n//          # Remove first newline character.
  s/\n/, /g       # substitute the rest with output separator, comma in this case.
  p               # Print to output.

#2

Edit: I've edited my algorithm, since I had neglected to consider the header and footer (I thought they were just for our benefit).

编辑:我编辑了我的算法,因为我忽略了考虑页眉和页脚(我认为它们只是为了我们的利益)。

sed, by its design, accesses every line of an input file, and then performs expressions on ones that match some specification (or none). If you're tailoring your script to a certain number of lines, you're definitely doing something wrong! I won't write you a script since this is homework, but the general idea for one way to go about it is to write a script that does the following. Think of the ordering as the order things should be in a script.

sed,通过它的设计,访问输入文件的每一行,然后在符合某些规范(或没有)的那些上执行表达式。如果你将脚本定制到一定数量的行,你肯定做错了!我不会给你写一个脚本,因为这是一个功课,但是一种方法的一般想法就是编写一个执行以下操作的脚本。将排序视为脚本中的顺序。

Skip the first three lines using d, which deletes the pattern space and immediately moves on to the next line.

使用d跳过前三行,删除模式空间并立即移动到下一行。

For each line that isn't a blank line, do the following steps. (This would all be in a single set of curly braces.)
1. Replace everything after and including the first semicolon (;) with a comma-and-space (", ") using the s (substitute) command.
2. Append the current pattern space into the hold buffer (look at H).
3. Delete the pattern space and move on to the next line, like in step 1.

对于不是空白行的每一行,请执行以下步骤。 (这将全部在一组花括号中。)使用s(替换)命令替换包含第一个分号(;)的所有内容,并使用逗号和空格(“,”)。将当前模式空间附加到保持缓冲区(查看H)。删除图案空间并移至下一行,如步骤1中所示。

For each line that gets to this point in the script (should be the first blank line), retrieve the contents of the hold space into the pattern space. (This would be after the curly braces above.)

对于在脚本中到达此点的每一行(应该是第一个空行),将保留空间的内容检索到模式空间中。 (这将是在上面的花括号之后。)

Substitute all newlines in the pattern space with nothing.

什么都不替换模式空间中的所有换行符。

Next, substitute the last comma-and-space in the pattern space with nothing.

接下来,替换模式空间中的最后一个逗号和空格。

Finally, quit the program so you don't process any more lines. My script worked without this, but I'm not 100% sure why.

最后,退出程序,这样就不再处理任何行。我的脚本没有这个,但我不是百分之百确定原因。

That being said, that's just one way to go about it. sed often offers varying ways of varying complexity to accomplish a task. A solution I wrote with this method is 10 lines long.

话虽这么说,这只是一种方法。 sed经常提供不同的复杂方式来完成任务。我用这种方法写的解决方案是10行长。

As a note, I don't bother suppressing printing (with -n) or manually printing (with p); each line is printed by default. My script runs like this:

作为一个注释,我不打扰抑制打印(使用-n)或手动打印(使用p);默认情况下会打印每一行。我的脚本运行如下:

$ sed -f companies.sed companies 
BAC, CSCO, INTC, MSFT, VZ, KO, MMM

#3

This sed command should produce your required output:

这个sed命令应该产生你需要的输出:

sed -rn '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt

OR on Mac:

或者在Mac上:

sed -En '/[0-9]+$/{s/^([^;]*).*$/\1/p;}' file.txt

#4

This might work for you:

这可能对你有用:

sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stocks

We don't want the headings so let's delete them. 1d

我们不想要标题,所以让我们删除它们。 1D

All data items are delimited by ;'s so let's concentrate on those lines. /;/

所有数据项都以;分隔;所以让我们专注于这些行。 /; /

Of the things above delete everything from the first ; to the end of line and then stuff it away in the the hold space (HS) {s/;.*//;H}

上面的内容从第一个删除一切;到行尾,然后把它塞进保留空间(HS){s / ;.*//; H}

When you get to the last line, overwrite it with the HS using the g command, delete the first newline (generated by the H command), replace all subsequent newlines with a comma and a space and print out what's left. ${g;s/.//;s/\n/, /g;q}

当你到达最后一行时,使用g命令用HS覆盖它,删除第一个换行符(由H命令生成),用逗号和空格替换所有后续换行符并打印出剩下的内容。 $ {g; s /.//; s / \ n /,/ g; q}

Delete everything else d

删除其他所有d

Here's a terminal session showing the incremental refinement of building a sed command:

这是一个终端会话,显示了构建sed命令的渐进式细化:

cat <<! >stock # paste the file into a here doc and pass it on to a file
> Symbol;Name;Volume
> ================================================
> 
> BAC;Bank of America Corporation Com;238,059,612
> CSCO;Cisco Systems, Inc.;28,159,455
> INTC;Intel Corporation;22,501,784
> MSFT;Microsoft Corporation;23,363,118
> VZ;Verizon Communications Inc. Com;5,744,385
> KO;Coca-Cola Company (The) Common;3,752,569
> MMM;3M Company Common Stock;1,660,453
> 
> ================================================
> !
sed '1d;/;/!d' stock # delete headings and everything but data lines
BAC;Bank of America Corporation Com;238,059,612
CSCO;Cisco Systems, Inc.;28,159,455
INTC;Intel Corporation;22,501,784
MSFT;Microsoft Corporation;23,363,118
VZ;Verizon Communications Inc. Com;5,744,385
KO;Coca-Cola Company (The) Common;3,752,569
MMM;3M Company Common Stock;1,660,453
sed '1d;/;/{s/;.*//p};d' stock # delete all non essential data
BAC
CSCO
INTC
MSFT
VZ
KO
MMM
sed '1d;/;/{s/;.*//;H};${g;l};d' stock # use the l command to see what's really there!
\nBAC\nCSCO\nINTC\nMSFT\nVZ\nKO\nMMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;l};d' stock # refine refine
BAC, CSCO, INTC, MSFT, VZ, KO, MMM$
sed '1d;/;/{s/;.*//;H};${g;s/.//;s/\n/, /g;q};d' stock # all done!
BAC, CSCO, INTC, MSFT, VZ, KO, MMM

#1