linux_shell_5_shell特性_正则_1

前面我们了解了部分linux shell的相关特性，下面的链接是第4篇文章：linux_shell_4_shell特性

这里我们来继续讨论linux shell中至关重要的一个特性: 正则表达式 (regular expression)。

正则表达式主要是用来处理字符流的，它的处理单位是行字符文本，也就是说正则表达式的处理对象是：字符行。

【1】单字符通配

在bash中，我么可以使用？来匹配单个字符，但是在正则表达式中，？不能用作单个字符的通配。这一点需要引起注意。

在正则表达式中利用：

[a-z]   #匹配lowcase char

[A-Z]   #匹配upcase char
[a-zA-Z] #匹配任意char 字符

这三个表达式仅表示通配一个字符。

这里我们以鸟哥的一个在线文档为例： http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt

你可以利用wget命令获取这个文档。

"Open Source" is a good mechanism to develop programs.

apple is my favorite food.

Football game is not use feet only.

this dress doesn't fit me.

However, this dress is about $  dollars.

GNU is free air not free beer.

Her hair is very beauty.

I can't finish the test.

Oh! The soup taste good.

motorcycle is cheap than car.

This window is clear.

the symbol '*' is represented as start.

Oh!    My god!

The gd software is a library for drafting programs.

You are the best is mean you are the no. .

The world <Happy> is the same with "glad".

I like dog.

google is the best tools for search keyword.

goooooogle yes!

go! go! Let's go.

# I am VBird

Exp: 查找含有 test 或者 taste 的行

[volcanol@volcanol ~]$ grep -n --color=auto 't[ae]st' regular_express.txt

:I can't finish the test.

:Oh! The soup taste good.

这里我们可以看到第八行含有一个 test 字符序列，与 t[ae]st 真好可以匹配一个；而第九行 taste 也正好匹配一个，当然如果文档有如果

包含里 test 或者 tast 字符序列的那么也将会显示出来。如果要同时查找仅匹配： test 和 taste的字符串则需要将这个表达式进行修改：

[volcanol@volcanol ~]$ grep -n --color=auto 't[a-z]st' regular_express.txt | grep 't[ae]st[^a-df-z]'

:I can't finish the test.

:Oh! The soup taste good.

利用下面这个命令才能真正的找到test 或者 taste 。

我们可以测试一下：

[volcanol@volcanol ~]$ echo "i can't finish the testf" | grep -n --color=auto 't[a-z]st'  | grep 't[ae]st[^a-df-z]'

[volcanol@volcanol ~]$ echo "i can't finish the teste" | grep -n --color=auto 't[a-z]st'  | grep 't[ae]st[^a-df-z]'

:i can't finish the teste

[volcanol@volcanol ~]$ echo "i can't finish the tastf" | grep -n --color=auto 't[a-z]st'  | grep 't[ae]st[^a-df-z]'

可以发现第二个命令可以实现我们需要的功能。

【2】grep

grep是一个用来进行模式匹配的linux下的工具程序，通过这个工具可以进行数据的筛选。其命令如法如下：

SYNOPSIS

grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

可以看到grep 有两种命令格式，一般用的比较多的是第一种。即： grep 选项模式字符串待查找的文件

这里还需要说明一点：模式字符串可以用单引号，也可以用双引号，在bash中，虽然单引号和双引号在某些时候存在一些区别；在进行

模式匹配的时候推荐使用单引号，以免引起一些副作用。例如在查询： “$SHELL” 时，“$SHELL” 和‘$SHELL’ 是不相同的。

我们在 regular_expression文件中的最后增加两行内容：

[volcanol@volcanol ~]$ cat regular_express.txt

"Open Source" is a good mechanism to develop programs.

apple is my favorite food.

Football game is not use feet only.

this dress doesn't fit me.

However, this dress is about $  dollars.

GNU is free air not free beer.

Her hair is very beauty.

I can't finish the test.

Oh! The soup taste good.

motorcycle is cheap than car.

This window is clear.

the symbol '*' is represented as start.

Oh!    My god!

The gd software is a library for drafting programs.

You are the best is mean you are the no. .

The world <Happy> is the same with "glad".

I like dog.

google is the best tools for search keyword.

goooooogle yes!

go! go! Let's go.

# I am VBird

dido $SHELL

/bin/bash

[volcanol@volcanol ~]$ grep --color=auto "$SHELL" regular_express.txt

/bin/bash

[volcanol@volcanol ~]$ grep --color=auto '$SHELL' regular_express.txt

dido $SHELL

可以发现两个命令输出的结果不一样，下面我们利用grep 进行一些简单的测试。

Exp1：获取所有含有 the 的行

[volcanol@volcanol ~]$ grep --color=auto 'the' regular_express.txt

I can't finish the test.

the symbol '*' is represented as start.

You are the best is mean you are the no. .

The world <Happy> is the same with "glad".

google is the best tools for search keyword.

这里我们可以看到命令成功执行，每一个输出行都含有一个连续的字符序列： the

Exp：获取所有不含有 the 的行

[volcanol@volcanol ~]$ grep -nv 'the' regular_express.txt

:"Open Source" is a good mechanism to develop programs.

:apple is my favorite food.

:Football game is not use feet only.

:this dress doesn't fit me.

:However, this dress is about $  dollars.

:GNU is free air not free beer.

:Her hair is very beauty.

:Oh! The soup taste good.

:motorcycle is cheap than car.

:This window is clear.

:Oh!    My god!

:The gd software is a library for drafting programs.

:I like dog.

:goooooogle yes!

:go! go! Let's go.

:# I am VBird

:dido $SHELL

:/bin/bash

在这里通过参数-v；实现反向选择。

Exp：不区分大小写来选取

grep 通过参数 -i 来实现不去很大小写。

:I can't finish the test.

:Oh! The soup taste good.

:the symbol '*' is represented as start.

:The gd software is a library for drafting programs.

:You are the best is mean you are the no. .

:The world <Happy> is the same with "glad".

:google is the best tools for search keyword.

可以发现第14、16行的The 别选取了。

Exp: 输出结果，显示行号

通过参数-n 实现输出显示行号

[volcanol@volcanol ~]$ grep  'test' regular_express.txt

I can't finish the test.

[volcanol@volcanol ~]$ grep -n  'test' regular_express.txt

:I can't finish the test.

可以发现，加上参数 -n 后，在输出的结果前面多了 8：

Exp：单字符通配

正则表达式的单字符通配与bash shell 的单字符通配存在一些不一样，在bash shell中非正则表达式通配符利用？表示单字符通配；而

正则表达式中单字符通配，利用 [ ] 来实现。

[volcanol@volcanol ~]$ grep -n 'ab[a-z]' regular_express.txt

:However, this dress is about $  dollars.

如上命令，执行后，红色部分表示在文件中匹配的字符，这里利用 [a-z] 表示匹配 a-z 之间任意一个字符就算匹配成功，因为模式字符串为

ab[a-z], 因此只要是 aba、abb、abc、.......aby、abz 中有一个匹配上就认为模式匹配成功，因此这里认为about 中的abo 匹配成功，因此输出

about所在的行。

Exp：行首、行尾匹配限定

有时需要将匹配字符串限定在行首、行尾，这时可以使用行首限定 ^；或者行尾限定$.

例如，我们仅能匹配行首匹配the的字符串，可以如下实现：

[volcanol@volcanol ~]$ grep -n '^the' regular_express.txt

:the symbol '*' is represented as start.

可以看出只有 the 在行首匹配的这一行输出了。

如果我们想匹配行尾的 d 字符的一行，可以如下实现：

[volcanol@volcanol ~]$ grep -n 'd$' regular_express.txt

:# I am VBird

可以看出，只有最后一个字符是d的21行被选择输出了。

Exp: 注意方向选择行与排除字符在外，以及^符号作为行首和排除字符的区别。

当将^字符放在模式字符串的开头的时候，表示行首，当将^ 放在 [ ]中的时候表示排除，例如我们可以用下面的命令来表示空白行：

[volcanol@volcanol ~]$ grep -n '^$' regular_express.txt

:

:

可以发现文章的第24、25行是空白行。

这里小技巧我们可以利用cat 命令查看，空白行在linux中是怎么表示的：

cat -An regular_express.txt

         "Open Source" is a good mechanism to develop programs.$

   ........

        goooooogle yes!$

        go! go! Let's go.$

        # I am VBird$

        dido $SHELL$

        /bin/bash$

        $

        $

　　可以看到，在linux中，空白行只有$符号表示空白行，这个与Win有点不一样。

Exp：任意字符 . 和重复字符 *

在正则表达式里，任意字符与通常的通配符不一样，在正则表达式里面，任意字符用 . 号表示，而用 * 号表示重复。例如：

[volcanol@volcanol ~]$ grep -n 'a.c' regular_express.txt

:google is the best tools for search keyword.

如上所示，arc 就匹配里这个模式字符串 a.c ; 修改一下文件，然后在进行测试。

:google is the best tools for search keyword.

:///abc

可以发现 search 和 abc 这些都匹配了这个模式串。

那么 * 在正则表达式中又表示什么呢？ * 表示重复。

[volcanol@volcanol ~]$ grep -n 'a.*c' regular_express.txt

:"Open Source" is a good mechanism to develop programs.

:motorcycle is cheap than car.

:google is the best tools for search keyword.

:///abc

:////abdddc

可以看到这个模式匹配时，* 将结果放大里很多；因为这里表示将 * 前面的字符串可以重复匹配0到任意次。而 . 在* 前面，他匹配

任意字符，所以有了这个结果。我们还可以看一个别的测试：

[volcanol@volcanol ~]$ grep -n 'bd*c' regular_express.txt

:///abc

:////abdddc

第25行匹配里 d 零次，而26行匹配里3次。

Exp：指定重复次数

我们用* 表示重复任意次数，但是有时候我们仅需要重复有限的次数，这时我们可以利用 {} 来表示。但是因为{} 在这里是特殊字符

因此需要用 \ 转义。

[volcanol@volcanol ~]$ grep -n 'bd\{2,\}c' regular_express.txt

:////abdddc

[volcanol@volcanol ~]$ grep -n 'bd\{3,\}c' regular_express.txt

:////abdddc

[volcanol@volcanol ~]$ grep -n 'bd\{4,\}c' regular_express.txt

这里我们可以看到，第一次表示匹配d字符2次或者2次以上，

第二个命令表示匹配d字符3次或者3次以上

第三个命令表示匹配d字符4次或者4次以上

如果我们需要指定重复次数的话可以这样来看：

[volcanol@volcanol ~]$ grep -n 'bd\{2,4\}c' regular_express.txt

:////abdddc

[volcanol@volcanol ~]$ grep -n 'bd\{2,6\}c' regular_express.txt

:////abdddc

上面第一行表示重复2到4次，

第二个命令表示重复2到6次

小结：

在利用grep 这些工具的时候，一定要注意其处理时是支持正则表达式的，要注意与普通工具的差别，否则

将不能得到您想要的结果。

秒客网

linux_shell_5_shell特性_正则_1

相关文章