计算python中多维数组中数组的出现次数

时间:2022-05-30 02:24:20

I have the following type of arrays:

我有以下类型的数组:

a = array([[1,1,1],
           [1,1,1],
           [1,1,1],
           [2,2,2],
           [2,2,2],
           [2,2,2],
           [3,3,0],
           [3,3,0],
           [3,3,0]])

I would like to count the number of occurrences of each type of array such as

我想计算每种类型数组的出现次数,例如

[1,1,1]:3, [2,2,2]:3, and [3,3,0]: 3 

How could I achieve this in python? Is it possible without using a for loop and counting into a dictionary? It has to be fast and should take less than 0.1 seconds or so. I looked into Counter, numpy bincount, etc. But, those are for individual element not for an array.

我怎么能在python中实现这一点?是否可以不使用for循环并计入字典?它必须快速,并且应该花费不到0.1秒左右。我查看了Counter,numpy bincount等等。但是,这些是针对单个元素而不是数组。

Thanks.

5 个解决方案

#1


2  

If you don't mind mapping to tuples just to get the count you can use a Counter dict which runs in 28.5 µs on my machine using python3 which is well below your threshold:

如果你不介意映射到元组只是为了计算你可以使用一个计数器字典,它在我的机器上运行28.5μs使用python3,远低于你的阈值:

In [5]: timeit Counter(map(tuple, a))
10000 loops, best of 3: 28.5 µs per loop

In [6]: c = Counter(map(tuple, a))

In [7]: c
Out[7]: Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

#2


2  

collections.Counter can do this conveniently, and almost like the example given.

collections.Counter可以方便地做到这一点,几乎就像给出的例子。

>>> from collections import Counter
>>> c = Counter()
>>> for x in a:
...   c[tuple(x)] += 1
...
>>> c
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

This converts each sub-list to a tuple, which can be keys in a dictionary since they are immutable. Lists are mutable so can't be used as dict keys.

这将每个子列表转换为元组,元组可以是字典中的键,因为它们是不可变的。列表是可变的,因此不能用作dict键。

Why do you want to avoid using for loops?

为什么要避免使用for循环?

And similar to @padraic-cunningham's much cooler answer:

和@ padraic-cunningham的回答类似:

>>> Counter(tuple(x) for x in a)
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
>>> Counter(map(tuple, a))
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

#3


2  

You could convert those rows to a 1D array using the elements as two-dimensional indices with np.ravel_multi_index. Then, use np.unique to give us the positions of the start of each unique row and also has an optional argument return_counts to give us the counts. Thus, the implementation would look something like this -

您可以使用元素作为二维索引将这些行转换为1D数组,并使用np.ravel_multi_index。然后,使用np.unique为我们提供每个唯一行的开头位置,并且还有一个可选参数return_counts来给我们计数。因此,实现看起来像这样 -

def unique_rows_counts(a):

    # Calculate linear indices using rows from a
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )

    # Get the unique indices and their counts
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)

    # return the unique groups from a and their respective counts
    return a[unq_idx], counts

Sample run -

样品运行 -

In [64]: a
Out[64]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [3, 3, 0],
       [3, 3, 0],
       [3, 3, 0]])

In [65]: unqrows, counts = unique_rows_counts(a)

In [66]: unqrows
Out[66]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])

Benchmarking

Assuming you are okay with either numpy arrays or collections as outputs, one can benchmark the solutions provided thus far, like so -

假设您可以使用numpy数组或集合作为输出,可以对目前为止提供的解决方案进行基准测试,如下所示 -

Function definitions:

import numpy as np
from collections import Counter

def unique_rows_counts(a):
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
    return a[unq_idx], counts

def map_Counter(a):
    return Counter(map(tuple, a))    

def forloop_Counter(a):      
    c = Counter()
    for x in a:
        c[tuple(x)] += 1
    return c

Timings:

In [53]: a = np.random.randint(0,4,(10000,5))

In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop

In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop

In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop

#4


1  

The numpy_indexed package (disclaimer: I am its author) contains efficient vectorized functionality for these kind of operations:

numpy_indexed包(免责声明:我是它的作者)包含用于这些操作的高效矢量化功能:

import numpy_indexed as npi
unique_rows, row_count = npi.count(a, axis=0)

Note that this works for arrays of any dimensionality or datatype.

请注意,这适用于任何维度或数据类型的数组。

#5


1  

Since numpy-1.13.0, np.unique can be used with axis argument:

从numpy-1.13.0开始,np.unique可以与axis参数一起使用:

>>> np.unique(a, axis=0, return_counts=True)

(array([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 0]]), array([3, 3, 3]))

#1


2  

If you don't mind mapping to tuples just to get the count you can use a Counter dict which runs in 28.5 µs on my machine using python3 which is well below your threshold:

如果你不介意映射到元组只是为了计算你可以使用一个计数器字典,它在我的机器上运行28.5μs使用python3,远低于你的阈值:

In [5]: timeit Counter(map(tuple, a))
10000 loops, best of 3: 28.5 µs per loop

In [6]: c = Counter(map(tuple, a))

In [7]: c
Out[7]: Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

#2


2  

collections.Counter can do this conveniently, and almost like the example given.

collections.Counter可以方便地做到这一点,几乎就像给出的例子。

>>> from collections import Counter
>>> c = Counter()
>>> for x in a:
...   c[tuple(x)] += 1
...
>>> c
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

This converts each sub-list to a tuple, which can be keys in a dictionary since they are immutable. Lists are mutable so can't be used as dict keys.

这将每个子列表转换为元组,元组可以是字典中的键,因为它们是不可变的。列表是可变的,因此不能用作dict键。

Why do you want to avoid using for loops?

为什么要避免使用for循环?

And similar to @padraic-cunningham's much cooler answer:

和@ padraic-cunningham的回答类似:

>>> Counter(tuple(x) for x in a)
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
>>> Counter(map(tuple, a))
Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})

#3


2  

You could convert those rows to a 1D array using the elements as two-dimensional indices with np.ravel_multi_index. Then, use np.unique to give us the positions of the start of each unique row and also has an optional argument return_counts to give us the counts. Thus, the implementation would look something like this -

您可以使用元素作为二维索引将这些行转换为1D数组,并使用np.ravel_multi_index。然后,使用np.unique为我们提供每个唯一行的开头位置,并且还有一个可选参数return_counts来给我们计数。因此,实现看起来像这样 -

def unique_rows_counts(a):

    # Calculate linear indices using rows from a
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )

    # Get the unique indices and their counts
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)

    # return the unique groups from a and their respective counts
    return a[unq_idx], counts

Sample run -

样品运行 -

In [64]: a
Out[64]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2],
       [2, 2, 2],
       [3, 3, 0],
       [3, 3, 0],
       [3, 3, 0]])

In [65]: unqrows, counts = unique_rows_counts(a)

In [66]: unqrows
Out[66]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 0]])
In [67]: counts
Out[67]: array([3, 3, 3])

Benchmarking

Assuming you are okay with either numpy arrays or collections as outputs, one can benchmark the solutions provided thus far, like so -

假设您可以使用numpy数组或集合作为输出,可以对目前为止提供的解决方案进行基准测试,如下所示 -

Function definitions:

import numpy as np
from collections import Counter

def unique_rows_counts(a):
    lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
    _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
    return a[unq_idx], counts

def map_Counter(a):
    return Counter(map(tuple, a))    

def forloop_Counter(a):      
    c = Counter()
    for x in a:
        c[tuple(x)] += 1
    return c

Timings:

In [53]: a = np.random.randint(0,4,(10000,5))

In [54]: %timeit map_Counter(a)
10 loops, best of 3: 31.7 ms per loop

In [55]: %timeit forloop_Counter(a)
10 loops, best of 3: 45.4 ms per loop

In [56]: %timeit unique_rows_counts(a)
1000 loops, best of 3: 1.72 ms per loop

#4


1  

The numpy_indexed package (disclaimer: I am its author) contains efficient vectorized functionality for these kind of operations:

numpy_indexed包(免责声明:我是它的作者)包含用于这些操作的高效矢量化功能:

import numpy_indexed as npi
unique_rows, row_count = npi.count(a, axis=0)

Note that this works for arrays of any dimensionality or datatype.

请注意,这适用于任何维度或数据类型的数组。

#5


1  

Since numpy-1.13.0, np.unique can be used with axis argument:

从numpy-1.13.0开始,np.unique可以与axis参数一起使用:

>>> np.unique(a, axis=0, return_counts=True)

(array([[1, 1, 1],
        [2, 2, 2],
        [3, 3, 0]]), array([3, 3, 3]))