AWK - 匹配第一列,如果最后一个字段的任何值大于1000,则打印两行

时间:2023-01-16 22:07:37

I'm having difficulties multiple lines as one. I hope getting this question resolved will help me with the next tasks I'll need to perform.

我有多条线路困难。我希望解决这个问题能帮助我完成我需要执行的下一个任务。

Logic

逻辑

If the first columns match, check if any of the line's last field is greater than or equal to 1000. If it is, print all lines.

如果第一列匹配,请检查该行的最后一个字段是否大于或等于1000.如果是,则打印所有行。

Current Code:

现行代码:

I've tried I basic code but I know it's failing because I'm not grouping the matching lines.

我试过我的基本代码,但我知道它失败了,因为我没有对匹配的行进行分组。

awk -F' ' '$1==$1 {print $0}' file | awk -v X=1000 -F' ' '{if($NF >= X)print $0}'

File

文件

LSP0    NODE0   NODE4   NODE3   591
LSP0    NODE0   NODE4   NODE5   NODE3   515
LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551
LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501

Desired Output

期望的输出

LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551
LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501

Possible?

可能?

If the first column matches, sum the last field. Resort all lines using the sum of the matching lines.

如果第一列匹配,则将最后一个字段求​​和。使用匹配线的总和来度假所有线。

LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501
LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551

2 个解决方案

#1


1  

gawk '
    {lines[$1] = lines[$1] $0 ORS; sum[$1] += $NF} 
    $NF > 1000 {p[$1] = 1} 
    END {
        PROCINFO["sorted_in"] = "@val_num_desc"
        for (key in sum) 
            if (p[key])
                printf "%s", lines[key]
    }
' file

#2


1  

a double scan algorithm

双扫描算法

$ awk 'NR==FNR{a[$1]+=($NF>=1000); next} a[$1]' file{,}

LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551
LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501

mark the keys where the criteria matches in the first scan, and print filtered rows in the second time.

标记第一次扫描中条件匹配的键,并在第二次打印过滤的行。

Here is the sorted variation

这是排序的变化

$ awk 'NR==FNR {a[$1]+=($NF>=1000)?$NF:0; next} 
       a[$1]   {print a[$1] "\t" $0}' file{,} | sort -s -k1nr | cut -f2-

LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501
LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551

assumes values are positive (so that don't summed up to zero, which will filter them out).

假设值为正(因此不会总结为零,这将过滤掉它们)。

#1


1  

gawk '
    {lines[$1] = lines[$1] $0 ORS; sum[$1] += $NF} 
    $NF > 1000 {p[$1] = 1} 
    END {
        PROCINFO["sorted_in"] = "@val_num_desc"
        for (key in sum) 
            if (p[key])
                printf "%s", lines[key]
    }
' file

#2


1  

a double scan algorithm

双扫描算法

$ awk 'NR==FNR{a[$1]+=($NF>=1000); next} a[$1]' file{,}

LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551
LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501

mark the keys where the criteria matches in the first scan, and print filtered rows in the second time.

标记第一次扫描中条件匹配的键,并在第二次打印过滤的行。

Here is the sorted variation

这是排序的变化

$ awk 'NR==FNR {a[$1]+=($NF>=1000)?$NF:0; next} 
       a[$1]   {print a[$1] "\t" $0}' file{,} | sort -s -k1nr | cut -f2-

LSP2    NODE4   NODE5   NODE7   60714
LSP2    NODE1   1501
LSP1    NODE2   NODE4   NODE3   NODE6   5511
LSP1    NODE2   NODE1   551

assumes values are positive (so that don't summed up to zero, which will filter them out).

假设值为正(因此不会总结为零,这将过滤掉它们)。