如何从文件中提取精确匹配的特殊字符?

时间:2021-11-06 05:57:48

I have a file like below:

我有如下文件:

A   4   ab,cc,ab,bc
B   6   x,xx,y,%,%%,\,\\
AB  0   

I need to grep special characters from the third column from the file and return its corresponding first column. For e.g., I need to grep '%' and it would return me B (it's corresponding first column)

我需要从文件的第三列提取特殊字符,并返回相应的第一列。例如,我需要grep '%,它会返回B(对应的第一列)

I have tried using:

我有试着用:

grep -w "%" file1

But it would return me % and %% both. Like:

但它会同时返回%和%。如:

B   6   x,xx,y,%,%%,\,\\

Where %,%% are highlighted. I only want to grep the exact word/character as searched. In the above case it should only try to find '%' and not '%%'. This approach works fine with words as from grep manual grep -w works when it finds lines containing matches that form whole words.

%,% %突出显示。我只想输入搜索到的单词/字符。在上面的例子中,它应该只查找'%'而不是'%'。这种方法可以很好地处理来自grep手册grep -w的单词,当它发现包含组成整个单词的匹配的行时也可以。

I also tried using it with

我也试过用它

grep -wP "%" file1

for Perl like pattern. But did not return anything.

对于Perl这样的模式。但没有归还任何东西。

Can anyone suggest how can I grep exact matching special characters? This however does not solve the problem for special characters '\'. Backslash can be escaped and handled. But for the other special characters I need to find a solution.

有谁能告诉我怎样才能找到完全匹配的特殊字符吗?然而,这并不能解决特殊字符“\”的问题。可以转义并处理反斜杠。但对于其他特殊字符,我需要找到一个解。


OK. Slight change required here in my question. All the answers given here are great and work very well according to my question. But maybe I missed another requirement here. My bad. As all the solutions here used '%' as test parameter, but '%' was only my example. What I really was looking for is more of a generalized solution working for all the words/characters. I'll give an example. Consider the file below:

好的。在我的问题中需要稍微改变一下。根据我的问题,这里给出的所有答案都很好,而且效果很好。但也许我漏掉了另一个要求。我的坏。由于这里的所有解决方案都使用'%'作为测试参数,但'%'只是我的示例。我真正想要的是为所有的单词/字符找到一个更通用的解决方案。我给一个例子。考虑下面的文件:

A   4   a    b,c            c,ab,bc
          ^          ^
          ^     couple of tabs here
      multiple spaces here
B   6   x,xx,y,%,%%,\,\\
AB  0 

What I mean to say is that the file can contain any sort of characters, words (separated by single/multiple spaces, tabs, etc.) and also any special characters (including single quote ('), double quote ("), backslash ()). These three needs to be specially handled as they are kind of reserved.

我的意思是这个文件可以包含任何类型的字符、单词(由单个/多个空格、制表符等分隔)以及任何特殊字符(包括单引号(')、双引号(")、反斜杠())))。这三个需要特别处理,因为它们是预留的。

I apologize for missing this part before, but I hope the kind of solution I am looking for here would be clear now.

我很抱歉之前漏掉了这部分,但我希望我在这里寻找的解决方案现在就能得到解决。

I would vote up for all the working solutions for special characters. But it doesn't allow me to (less reputation). But would there be a general solution? or if I can separate words(letters & numbers) and special characters by some if condition in shell script maybe?

我会为所有的特殊角色的工作解决方案投票。但它不允许我(更少的名声)。但是会有通解吗?或者我是否可以用shell脚本中的if条件来区分单词(字母和数字)和特殊字符?

Thanks in advance

谢谢提前

5 个解决方案

#1


2  

Using perl from command line,

使用命令行中的perl,

perl -nE 'say /(\S+)/ if /%/' file

#2


1  

What about awk

awk是什么

$ awk '/%/{print $1}' inputFile
B

To match exact % in file, you can use lookarounds

要在文件中匹配精确的%,可以使用查找框

$ grep -o '(?<!%)%(?!%)' input
  • (?<!%) Negeative look behind. Asserts that % is not presceded by %

    (? < ! %)Negeative向后看。断言%没有被%预筛选

  • (?!%) Negative look ahead. Asserts that the % is not followed by %

    (? %)负面展望未来。断言%后面不跟着%

#3


0  

You could use grep with -P parameter.

您可以使用带有-P参数的grep。

$ grep -oP '^\S+(?=\s+\S+\s+\S*(?<!%)%(?!%)\S*)' file
B

Example:

例子:

$ cat hi
A   4   ab,cc,ab,bc
B   6   x,xx,y,%,%%,\,\\
AB  0   
C   6   x,xx,y,%%
$ grep -oP '^\S+(?=\s+\S+\s+\S*(?<!%)%(?!%)\S*)' hi
B

#4


0  

Using perl, with autosplit by line - this splits fields on whitespace into an zero indexed array @F. I then print the first field ($F[0]) if the 3rd field ($F[3]) matches the regex pattern

使用perl,使用逐行自动分割——这将空格上的字段分割成一个0索引的数组@F。然后,如果第3个字段($F[3])与regex模式匹配,我将打印第一个字段($F[0])

Using negated character class to match just one % surrounded by things that are not a %. You could match on , if you're always looking for a comma separated field. If you don't know the delimiters many other answers have given you some examples of lookahead/lookbehind expressions

使用否定字符类来匹配被不为%的事物包围的一个%。如果你总是在寻找一个逗号分隔的字段,你可以匹配。如果您不知道分隔符,许多其他的答案已经给出了一些lookahead/lookbehind表达式的示例

$ perl -lane 'print $F[0] if $F[2] =~ "[^%]%[^%]" ' < file1 
B

#5


0  

You can use an extended match:

您可以使用扩展匹配:

$ grep -P '(?<=[\s|,])%(?=[,$])' file
B   6   x,xx,y,%,%%,\,\\
               ^
               highlighted

This will just match if it is surrounded by either space or , and , or end of line.

如果它被空间或者,和,或者线的末端包围,那么它就会匹配。

Explanation

grep -P '(?<=[\s|,])%(?=[,$])'
  • -P make grep use extended Perl regexp.
  • -P使grep使用扩展的Perl regexp。
  • (?<=X) means: check if there is X before.
  • (?<=X)的意思是:检查之前是否有X。
  • [\s|,] means: either a space character or a comma.
  • 意思是:空格字符或逗号。
  • (?=Y) means: check if there is Y after the match.
  • (?=Y)的意思是:检查比赛后是否有Y。
  • [,$] means: either a comma or the end of the line.
  • 表示:不是逗号,就是行尾。

#1


2  

Using perl from command line,

使用命令行中的perl,

perl -nE 'say /(\S+)/ if /%/' file

#2


1  

What about awk

awk是什么

$ awk '/%/{print $1}' inputFile
B

To match exact % in file, you can use lookarounds

要在文件中匹配精确的%,可以使用查找框

$ grep -o '(?<!%)%(?!%)' input
  • (?<!%) Negeative look behind. Asserts that % is not presceded by %

    (? < ! %)Negeative向后看。断言%没有被%预筛选

  • (?!%) Negative look ahead. Asserts that the % is not followed by %

    (? %)负面展望未来。断言%后面不跟着%

#3


0  

You could use grep with -P parameter.

您可以使用带有-P参数的grep。

$ grep -oP '^\S+(?=\s+\S+\s+\S*(?<!%)%(?!%)\S*)' file
B

Example:

例子:

$ cat hi
A   4   ab,cc,ab,bc
B   6   x,xx,y,%,%%,\,\\
AB  0   
C   6   x,xx,y,%%
$ grep -oP '^\S+(?=\s+\S+\s+\S*(?<!%)%(?!%)\S*)' hi
B

#4


0  

Using perl, with autosplit by line - this splits fields on whitespace into an zero indexed array @F. I then print the first field ($F[0]) if the 3rd field ($F[3]) matches the regex pattern

使用perl,使用逐行自动分割——这将空格上的字段分割成一个0索引的数组@F。然后,如果第3个字段($F[3])与regex模式匹配,我将打印第一个字段($F[0])

Using negated character class to match just one % surrounded by things that are not a %. You could match on , if you're always looking for a comma separated field. If you don't know the delimiters many other answers have given you some examples of lookahead/lookbehind expressions

使用否定字符类来匹配被不为%的事物包围的一个%。如果你总是在寻找一个逗号分隔的字段,你可以匹配。如果您不知道分隔符,许多其他的答案已经给出了一些lookahead/lookbehind表达式的示例

$ perl -lane 'print $F[0] if $F[2] =~ "[^%]%[^%]" ' < file1 
B

#5


0  

You can use an extended match:

您可以使用扩展匹配:

$ grep -P '(?<=[\s|,])%(?=[,$])' file
B   6   x,xx,y,%,%%,\,\\
               ^
               highlighted

This will just match if it is surrounded by either space or , and , or end of line.

如果它被空间或者,和,或者线的末端包围,那么它就会匹配。

Explanation

grep -P '(?<=[\s|,])%(?=[,$])'
  • -P make grep use extended Perl regexp.
  • -P使grep使用扩展的Perl regexp。
  • (?<=X) means: check if there is X before.
  • (?<=X)的意思是:检查之前是否有X。
  • [\s|,] means: either a space character or a comma.
  • 意思是:空格字符或逗号。
  • (?=Y) means: check if there is Y after the match.
  • (?=Y)的意思是:检查比赛后是否有Y。
  • [,$] means: either a comma or the end of the line.
  • 表示:不是逗号,就是行尾。