如何在Bash中从数组中获取惟一值?

时间:2022-11-24 21:50:01

I've got almost the same question as here.

我的问题和这里差不多。

I have an array which contains aa ab aa ac aa ad, etc. Now I want to select all unique elements from this array. Thought, this would be simple with sort | uniq or with sort -u as they mentioned in that other question, but nothing changed in the array... The code is:

我有一个包含aa ab aa ac aa ad的数组,现在我要从这个数组中选择所有唯一的元素。虽然这对于排序| uniq或者排序-u来说很简单,正如他们在另一个问题中提到的,但是数组中没有任何改变……的代码是:

echo `echo "${ids[@]}" | sort | uniq`

What am I doing wrong?

我做错了什么?

11 个解决方案

#1


76  

A bit hacky, but this should do it:

有点乱,但应该这样做:

echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

To save the sorted unique results back into an array, do Array assignment:

要将排序后的唯一结果保存到数组中,请执行数组分配:

sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

If your shell supports herestrings (bash should), you can spare an echo process by altering it to:

如果您的shell支持此字符串(bash应该支持),您可以通过将其更改为:

tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '

Input:

输入:

ids=(aa ab aa ac aa ad)

Output:

输出:

aa ab ac ad

Explanation:

解释:

  • "${ids[@]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The @ part means "all elements in the array"
  • “${ids[@]}”——用于处理shell数组的语法,无论是作为echo的一部分还是作为herestring的一部分。@部分表示“数组中的所有元素”
  • tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
  • tr ' '\n' -将所有空格转换为换行符。因为shell将数组视为单行上的元素,用空格分隔;因为sort期望输入是独立的。
  • sort -u - sort and retain only unique elements
  • sort -u - sort并只保留惟一的元素
  • tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
  • tr '\n' -将前面添加的新行转换回空格。
  • $(...) - Command Subsitution
  • (…)-命令替换
  • Aside: tr ' ' '\n' <<< "${ids[@]}" is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'
  • 旁白:tr ' ' ' \ n ' < < <“$ { id[@]}”是一个更有效的方法:echo " $ { id[@]} " | tr ' ' ' \ n '

#2


17  

If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:

如果您正在运行Bash version 4或更高版本(在任何现代版本的Linux中都应该如此),您可以通过创建一个包含原始数组的每个值的新关联数组在Bash中获得惟一的数组值。是这样的:

$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[@]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[@]}"
ac ad
ac
aa
ad

This works because in an array, each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].

这之所以有效,是因为在数组中,每个键只能出现一次。当for循环到达[2]中aa的第二个值时,它会覆盖原来为[0]设置的b[aa]。

Doing things in native bash can be faster than using pipes and external tools like sort and uniq.

在本地bash中执行操作比使用管道和外部工具(如sort和uniq)要快。

#3


10  

If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[@]}". Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.

如果您的数组元素有空格或任何其他shell特殊字符(您能确定它们没有吗?),那么首先要捕获这些字符(您应该一直这样做),用双引号表示数组!如。" $ {[@]}”。Bash将把它字面上解释为“单独参数中的每个数组元素”。在bash中,这总是有效的。

Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:

然后,为了获得排序(和唯一)的数组,我们必须将它转换为格式排序,并能够将它转换回bash数组元素。这是我想出的最好的办法:

eval a=($(printf "%q\n" "${a[@]}" | sort -u))

Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.

不幸的是,这在空数组的特殊情况下失败了,将空数组转换为一个空元素的数组(因为printf有0个参数,但仍然打印,就像它有一个空参数一样——请参阅解释)。所以你必须在if或其他条件下抓住它。

Explanation: The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval! Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.

说明:printf“shell转义”的%q格式打印的参数,以类似eval的方式恢复bash !因为每个元素都被打印在它自己的行上,所以元素之间的惟一分隔符是换行符,数组赋值将每一行作为元素,将转义值解析为文字文本。

e.g.

如。

> a=("foo bar" baz)
> printf "%q\n" "${a[@]}"
'foo bar'
baz
> printf "%q\n"
''

The eval is necessary to strip the escaping off each value going back into the array.

要将返回到数组的每个值的转义去掉,必须使用eval方法。

#4


7  

I realize this was already answered, but it showed up pretty high in search results, and it might help someone.

我知道这个问题已经得到了回答,但它在搜索结果中出现的频率很高,这可能会对某些人有所帮助。

printf "%s\n" "${IDS[@]}" | sort -u

Example:

例子:

~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo  "${IDS[@]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[@]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))
~> echo "${UNIQ_IDS[@]}"
aa ab ac ad
~>

#5


6  

'sort' can be used to order the output of a for-loop:

“sort”可以用来命令for循环的输出:

for i in ${ids[@]}; do echo $i; done | sort

and eliminate duplicates with "-u":

用-u消除重复:

for i in ${ids[@]}; do echo $i; done | sort -u

Finally you can just overwrite your array with the unique elements:

最后,你可以用唯一的元素覆盖你的数组:

ids=( `for i in ${ids[@]}; do echo $i; done | sort -u` )

#6


2  

this one will also preserve order:

这一项也将维持秩序:

echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'

and to modify the original array with the unique values:

用唯一值修改原始数组:

ARRAY=($(echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'))

#7


2  

To create a new array consisting of unique values, ensure your array is not empty then do one of the following:

要创建由唯一值组成的新数组,请确保数组不是空的,然后执行以下操作:

Remove duplicate entries (with sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | sort -u)

Remove duplicate entries (without sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | awk '!x[$0]++')

Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) ). It will break on spaces.

警告:不要尝试执行NewArray=($(printf ' s\n' "${OriginalArray[@]}" | sort -u)之类的操作。它会在空间上断裂。

#8


0  

Without loosing the original ordering:

不失去原有的排序:

uniques=($(tr ' ' '\n' <<<"${original[@]}" | awk '!u[$0]++' | tr '\n' ' '))

#9


0  

cat number.txt

猫number.txt

1 2 3 4 4 3 2 5 6

print line into column: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}'

打印行到列:猫号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ }’

1
2
3
4
4
3
2
5
6

find the duplicate records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'

找到重复的记录:cat编号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ } | awk的x[0]美元+ +”

4
3
2

Replace duplicate records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'

替换重复记录:猫号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ } | awk的! x[0]美元+ +”

1
2
3
4
5
6

Find only Uniq records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}

只找到Uniq记录:cat编号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ |”排序| uniq - u " }

1
5
6

#10


0  

Try this to get uniq values for first column in file

尝试这样获取文件中第一列的uniq值

awk -F, '{a[$1];}END{for (i in a)print i;}'

#11


0  

If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:

如果您想要一个只使用bash内部的解决方案,您可以将值设置为关联数组中的键,然后提取键:

declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[@]}"; do 
  uniqs["${f}"]=""
done

for thing in "${!uniqs[@]}"; do
  echo "${thing}"
done

This will output

这将输出

bar
foo
bar none

#1


76  

A bit hacky, but this should do it:

有点乱,但应该这样做:

echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

To save the sorted unique results back into an array, do Array assignment:

要将排序后的唯一结果保存到数组中,请执行数组分配:

sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

If your shell supports herestrings (bash should), you can spare an echo process by altering it to:

如果您的shell支持此字符串(bash应该支持),您可以通过将其更改为:

tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '

Input:

输入:

ids=(aa ab aa ac aa ad)

Output:

输出:

aa ab ac ad

Explanation:

解释:

  • "${ids[@]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The @ part means "all elements in the array"
  • “${ids[@]}”——用于处理shell数组的语法,无论是作为echo的一部分还是作为herestring的一部分。@部分表示“数组中的所有元素”
  • tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
  • tr ' '\n' -将所有空格转换为换行符。因为shell将数组视为单行上的元素,用空格分隔;因为sort期望输入是独立的。
  • sort -u - sort and retain only unique elements
  • sort -u - sort并只保留惟一的元素
  • tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
  • tr '\n' -将前面添加的新行转换回空格。
  • $(...) - Command Subsitution
  • (…)-命令替换
  • Aside: tr ' ' '\n' <<< "${ids[@]}" is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'
  • 旁白:tr ' ' ' \ n ' < < <“$ { id[@]}”是一个更有效的方法:echo " $ { id[@]} " | tr ' ' ' \ n '

#2


17  

If you're running Bash version 4 or above (which should be the case in any modern version of Linux), you can get unique array values in bash by creating a new associative array that contains each of the values of the original array. Something like this:

如果您正在运行Bash version 4或更高版本(在任何现代版本的Linux中都应该如此),您可以通过创建一个包含原始数组的每个值的新关联数组在Bash中获得惟一的数组值。是这样的:

$ a=(aa ac aa ad "ac ad")
$ declare -A b
$ for i in "${a[@]}"; do b["$i"]=1; done
$ printf '%s\n' "${!b[@]}"
ac ad
ac
aa
ad

This works because in an array, each key can only appear once. When the for loop arrives at the second value of aa in a[2], it overwrites b[aa] which was set originally for a[0].

这之所以有效,是因为在数组中,每个键只能出现一次。当for循环到达[2]中aa的第二个值时,它会覆盖原来为[0]设置的b[aa]。

Doing things in native bash can be faster than using pipes and external tools like sort and uniq.

在本地bash中执行操作比使用管道和外部工具(如sort和uniq)要快。

#3


10  

If your array elements have white space or any other shell special character (and can you be sure they don't?) then to capture those first of all (and you should just always do this) express your array in double quotes! e.g. "${a[@]}". Bash will literally interpret this as "each array element in a separate argument". Within bash this simply always works, always.

如果您的数组元素有空格或任何其他shell特殊字符(您能确定它们没有吗?),那么首先要捕获这些字符(您应该一直这样做),用双引号表示数组!如。" $ {[@]}”。Bash将把它字面上解释为“单独参数中的每个数组元素”。在bash中,这总是有效的。

Then, to get a sorted (and unique) array, we have to convert it to a format sort understands and be able to convert it back into bash array elements. This is the best I've come up with:

然后,为了获得排序(和唯一)的数组,我们必须将它转换为格式排序,并能够将它转换回bash数组元素。这是我想出的最好的办法:

eval a=($(printf "%q\n" "${a[@]}" | sort -u))

Unfortunately, this fails in the special case of the empty array, turning the empty array into an array of 1 empty element (because printf had 0 arguments but still prints as though it had one empty argument - see explanation). So you have to catch that in an if or something.

不幸的是,这在空数组的特殊情况下失败了,将空数组转换为一个空元素的数组(因为printf有0个参数,但仍然打印,就像它有一个空参数一样——请参阅解释)。所以你必须在if或其他条件下抓住它。

Explanation: The %q format for printf "shell escapes" the printed argument, in just such a way as bash can recover in something like eval! Because each element is printed shell escaped on it's own line, the only separator between elements is the newline, and the array assignment takes each line as an element, parsing the escaped values into literal text.

说明:printf“shell转义”的%q格式打印的参数,以类似eval的方式恢复bash !因为每个元素都被打印在它自己的行上,所以元素之间的惟一分隔符是换行符,数组赋值将每一行作为元素,将转义值解析为文字文本。

e.g.

如。

> a=("foo bar" baz)
> printf "%q\n" "${a[@]}"
'foo bar'
baz
> printf "%q\n"
''

The eval is necessary to strip the escaping off each value going back into the array.

要将返回到数组的每个值的转义去掉,必须使用eval方法。

#4


7  

I realize this was already answered, but it showed up pretty high in search results, and it might help someone.

我知道这个问题已经得到了回答,但它在搜索结果中出现的频率很高,这可能会对某些人有所帮助。

printf "%s\n" "${IDS[@]}" | sort -u

Example:

例子:

~> IDS=( "aa" "ab" "aa" "ac" "aa" "ad" )
~> echo  "${IDS[@]}"
aa ab aa ac aa ad
~>
~> printf "%s\n" "${IDS[@]}" | sort -u
aa
ab
ac
ad
~> UNIQ_IDS=($(printf "%s\n" "${IDS[@]}" | sort -u))
~> echo "${UNIQ_IDS[@]}"
aa ab ac ad
~>

#5


6  

'sort' can be used to order the output of a for-loop:

“sort”可以用来命令for循环的输出:

for i in ${ids[@]}; do echo $i; done | sort

and eliminate duplicates with "-u":

用-u消除重复:

for i in ${ids[@]}; do echo $i; done | sort -u

Finally you can just overwrite your array with the unique elements:

最后,你可以用唯一的元素覆盖你的数组:

ids=( `for i in ${ids[@]}; do echo $i; done | sort -u` )

#6


2  

this one will also preserve order:

这一项也将维持秩序:

echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'

and to modify the original array with the unique values:

用唯一值修改原始数组:

ARRAY=($(echo ${ARRAY[@]} | tr [:space:] '\n' | awk '!a[$0]++'))

#7


2  

To create a new array consisting of unique values, ensure your array is not empty then do one of the following:

要创建由唯一值组成的新数组,请确保数组不是空的,然后执行以下操作:

Remove duplicate entries (with sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | sort -u)

Remove duplicate entries (without sorting)

readarray -t NewArray < <(printf '%s\n' "${OriginalArray[@]}" | awk '!x[$0]++')

Warning: Do not try to do something like NewArray=( $(printf '%s\n' "${OriginalArray[@]}" | sort -u) ). It will break on spaces.

警告:不要尝试执行NewArray=($(printf ' s\n' "${OriginalArray[@]}" | sort -u)之类的操作。它会在空间上断裂。

#8


0  

Without loosing the original ordering:

不失去原有的排序:

uniques=($(tr ' ' '\n' <<<"${original[@]}" | awk '!u[$0]++' | tr '\n' ' '))

#9


0  

cat number.txt

猫number.txt

1 2 3 4 4 3 2 5 6

print line into column: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}'

打印行到列:猫号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ }’

1
2
3
4
4
3
2
5
6

find the duplicate records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}' |awk 'x[$0]++'

找到重复的记录:cat编号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ } | awk的x[0]美元+ +”

4
3
2

Replace duplicate records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i}' |awk '!x[$0]++'

替换重复记录:猫号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ } | awk的! x[0]美元+ +”

1
2
3
4
5
6

Find only Uniq records: cat number.txt | awk 'BEGIN{FS=" "} {for(i=1;i<=NF;i++) print $i|"sort|uniq -u"}

只找到Uniq记录:cat编号。txt | awk的开始{ FS = " " } {(i = 1;i < = NF;我+ +)打印$ |”排序| uniq - u " }

1
5
6

#10


0  

Try this to get uniq values for first column in file

尝试这样获取文件中第一列的uniq值

awk -F, '{a[$1];}END{for (i in a)print i;}'

#11


0  

If you want a solution that only uses bash internals, you can set the values as keys in an associative array, and then extract the keys:

如果您想要一个只使用bash内部的解决方案,您可以将值设置为关联数组中的键,然后提取键:

declare -A uniqs
list=(foo bar bar "bar none")
for f in "${list[@]}"; do 
  uniqs["${f}"]=""
done

for thing in "${!uniqs[@]}"; do
  echo "${thing}"
done

This will output

这将输出

bar
foo
bar none