在awk中使用bash变量作为数组,通过与数组进行比较来过滤输入文件

时间:2022-12-05 23:52:54

I have bash variable like this:

我有这样的bash变量:

val="abc jkl pqr"

And I have a file that looks smth like this:

我有一个文件看起来像这样

abc   4   5
abc   8   8
def   43  4
def   7   51
jkl   4   0
mno   32  2
mno   9   2
pqr   12  1

I want to throw away rows from file which first field isn't present in the val:

我想从文件中删除第一个字段在val中不存在的行:

abc   4   5
abc   8   8
jkl   4   0
pqr   12  1

My solution in awk doesn't work at all and I don't have any idea why:

我在awk中的解决方案根本不起作用,我也不知道为什么:

awk -v var="${val}" 'BEGIN{split(var, arr)}$1 in arr{print $0}' file

2 个解决方案

#1


4  

Just slice the variable into array indexes:

只需将变量分割成数组索引:

awk -v var="${val}" 'BEGIN{split(var, arr)
                           for (i in arr) 
                               names[arr[i]]
                     }
                     $1 in names' file

As commented in the linked question, when you call split() you get values for the array, while what you want to set are indexes. The trick is to generate another array with this content.

正如在链接的问题中所注释的,当您调用split()时,您将获得数组的值,而您希望设置的是索引。诀窍是用这个内容生成另一个数组。

As you see $1 in names suffices, you don't have to call for the action {print $0} when this happens, since it is the default.

正如您在名称中看到的$1就足够了,当发生这种情况时,您不必调用action {print $0},因为它是默认的。

As a one-liner:

作为一个一行程序:

$ awk -v var="${val}" 'BEGIN{split(var, arr); for (i in arr) names[arr[i]]} $1 in names' file
abc   4   5
abc   8   8
jkl   4   0
pqr   12  1

#2


0  

grep -E "$( echo "${val}"| sed 's/ /|/g' )" YourFile

# or

awk -v val="${val}" 'BEGIN{gsub(/ /, "|",val)} $1 ~ val' YourFile

Grep:

Grep:

  • it use a regex (extended version with option -E) that filter all the lines that contains the value. The regex is build OnTheMove in a subshell with a sed that replace the space separator by a | meaning OR
  • 它使用regex(带有选项-E的扩展版本)过滤包含值的所有行。regex是在子shell中构建的,它使用sed将空间分隔符替换为|的含义OR

Awk:

Awk:

  • use the same princip as the grep but everything is made inside (so no subshell)
  • 使用与grep相同的原理,但是所有东西都是在内部制造的(所以没有子层)
  • use the variable val assigned to the shell variable of the same name
  • 使用分配给同名shell变量的val变量
  • At start of the script (before first line read) change the space, (in val) by | with BEGIN{gsub(/ /, "|",val)}
  • 在脚本的开头(在第一行读之前)更改空格,(在val中)由|以开头{gsub(/,“|”,val)}
  • than, for every line where first field (default field separator is space/blank in awk, so first is the letter group) matching, print it (defaut action of a filter with $1 ~ val.
  • 对于第一个字段(默认字段分隔符在awk中为空格/空格,所以第一个是字母组)匹配的每一行,打印它($1 ~ val的过滤器的错误动作)。

#1


4  

Just slice the variable into array indexes:

只需将变量分割成数组索引:

awk -v var="${val}" 'BEGIN{split(var, arr)
                           for (i in arr) 
                               names[arr[i]]
                     }
                     $1 in names' file

As commented in the linked question, when you call split() you get values for the array, while what you want to set are indexes. The trick is to generate another array with this content.

正如在链接的问题中所注释的,当您调用split()时,您将获得数组的值,而您希望设置的是索引。诀窍是用这个内容生成另一个数组。

As you see $1 in names suffices, you don't have to call for the action {print $0} when this happens, since it is the default.

正如您在名称中看到的$1就足够了,当发生这种情况时,您不必调用action {print $0},因为它是默认的。

As a one-liner:

作为一个一行程序:

$ awk -v var="${val}" 'BEGIN{split(var, arr); for (i in arr) names[arr[i]]} $1 in names' file
abc   4   5
abc   8   8
jkl   4   0
pqr   12  1

#2


0  

grep -E "$( echo "${val}"| sed 's/ /|/g' )" YourFile

# or

awk -v val="${val}" 'BEGIN{gsub(/ /, "|",val)} $1 ~ val' YourFile

Grep:

Grep:

  • it use a regex (extended version with option -E) that filter all the lines that contains the value. The regex is build OnTheMove in a subshell with a sed that replace the space separator by a | meaning OR
  • 它使用regex(带有选项-E的扩展版本)过滤包含值的所有行。regex是在子shell中构建的,它使用sed将空间分隔符替换为|的含义OR

Awk:

Awk:

  • use the same princip as the grep but everything is made inside (so no subshell)
  • 使用与grep相同的原理,但是所有东西都是在内部制造的(所以没有子层)
  • use the variable val assigned to the shell variable of the same name
  • 使用分配给同名shell变量的val变量
  • At start of the script (before first line read) change the space, (in val) by | with BEGIN{gsub(/ /, "|",val)}
  • 在脚本的开头(在第一行读之前)更改空格,(在val中)由|以开头{gsub(/,“|”,val)}
  • than, for every line where first field (default field separator is space/blank in awk, so first is the letter group) matching, print it (defaut action of a filter with $1 ~ val.
  • 对于第一个字段(默认字段分隔符在awk中为空格/空格,所以第一个是字母组)匹配的每一行,打印它($1 ~ val的过滤器的错误动作)。