从numpy矩阵构造一个python集

时间:2023-01-18 12:52:18

I'm trying to execute the following

我正在尝试执行以下操作

>> from numpy import *
>> x = array([[3,2,3],[4,4,4]])
>> y = set(x)
TypeError: unhashable type: 'numpy.ndarray'

How can I easily and efficiently create a set from a numpy array?

如何从numpy数组中轻松高效地创建集合?

5 个解决方案

#1


20  

If you want a set of the elements, here is another, probably faster way:

如果你想要一组元素,这里是另一种,可能更快的方法:

y = set(x.flatten())

PS: after performing comparisons between x.flat, x.flatten(), and x.ravel() on a 10x100 array, I found out that they all perform at about the same speed. For a 3x3 array, the fastest version is the iterator version:

PS:在10x100阵列上执行x.flat,x.flatten()和x.ravel()之间的比较后,我发现它们都以大致相同的速度运行。对于3x3阵列,最快的版本是迭代器版本:

y = set(x.flat)

which I would recommend because it is the less memory expensive version (it scales up well with the size of the array).

我建议这是因为它是内存较少的版本(它可以很好地扩展到数组的大小)。

PS: There is also a NumPy function that does something similar:

PS:还有一个类似的NumPy函数:

y = numpy.unique(x)

This does produce a NumPy array with the same element as set(x.flat), but as a NumPy array. This is very fast (almost 10 times faster), but if you need a set, then doing set(numpy.unique(x)) is a bit slower than the other procedures (building a set comes with a large overhead).

这确实产生了一个NumPy数组,其元素与set(x.flat)相同,但是作为NumPy数组。这非常快(几乎快10倍),但是如果你需要一个set,那么set(numpy.unique(x))比其他程序慢一点(构建一个集合需要很大的开销)。

#2


11  

The immutable counterpart to an array is the tuple, hence, try convert the array of arrays into an array of tuples:

数组的不可变对应元组是元组,因此,尝试将数组数组转换为元组数组:

>> from numpy import *
>> x = array([[3,2,3],[4,4,4]])

>> x_hashable = map(tuple, x)

>> y = set(x_hashable)
set([(3, 2, 3), (4, 4, 4)])

#3


7  

The above answers work if you want to create a set out of the elements contained in an ndarray, but if you want to create a set of ndarray objects – or use ndarray objects as keys in a dictionary – then you'll have to provide a hashable wrapper for them. See the code below for a simple example:

如果你想创建一个包含在ndarray中的元素集,但是如果你想创建一组ndarray对象 - 或者使用ndarray对象作为字典中的键 - 那么你必须提供一个适合他们的可洗包装。有关简单示例,请参阅下面的代码:

from hashlib import sha1

from numpy import all, array, uint8


class hashable(object):
    r'''Hashable wrapper for ndarray objects.

        Instances of ndarray are not hashable, meaning they cannot be added to
        sets, nor used as keys in dictionaries. This is by design - ndarray
        objects are mutable, and therefore cannot reliably implement the
        __hash__() method.

        The hashable class allows a way around this limitation. It implements
        the required methods for hashable objects in terms of an encapsulated
        ndarray object. This can be either a copied instance (which is safer)
        or the original object (which requires the user to be careful enough
        not to modify it).
    '''
    def __init__(self, wrapped, tight=False):
        r'''Creates a new hashable object encapsulating an ndarray.

            wrapped
                The wrapped ndarray.

            tight
                Optional. If True, a copy of the input ndaray is created.
                Defaults to False.
        '''
        self.__tight = tight
        self.__wrapped = array(wrapped) if tight else wrapped
        self.__hash = int(sha1(wrapped.view(uint8)).hexdigest(), 16)

    def __eq__(self, other):
        return all(self.__wrapped == other.__wrapped)

    def __hash__(self):
        return self.__hash

    def unwrap(self):
        r'''Returns the encapsulated ndarray.

            If the wrapper is "tight", a copy of the encapsulated ndarray is
            returned. Otherwise, the encapsulated ndarray itself is returned.
        '''
        if self.__tight:
            return array(self.__wrapped)

        return self.__wrapped

Using the wrapper class is simple enough:

使用包装类很简单:

>>> from numpy import arange

>>> a = arange(0, 1024)
>>> d = {}
>>> d[a] = 'foo'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> b = hashable(a)
>>> d[b] = 'bar'
>>> d[b]
'bar'

#4


3  

If you want a set of the elements:

如果你想要一组元素:

>> y = set(e for r in x
             for e in r)
set([2, 3, 4])

For a set of the rows:

对于一组行:

>> y = set(tuple(r) for r in x)
set([(3, 2, 3), (4, 4, 4)])

#5


0  

I liked xperroni's idea. But I think implementation can be simplified using direct inheritance from ndarray instead of wrapping it.

我喜欢xperroni的想法。但我认为使用ndarray的直接继承而不是包装它可以简化实现。

from hashlib import sha1
from numpy import ndarray, uint8, array

class HashableNdarray(ndarray):
    def __hash__(self):
        if not hasattr(hasattr, '__hash'):
            self.__hash = int(sha1(self.view(uint8)).hexdigest(), 16)
        return self.__hash

    def __eq__(self, other):
        if not isinstance(other, HashableNdarray):
            return super(HashableNdarray, self).__eq__(other)
        return super(HashableNdarray, self).__eq__(super(HashableNdarray, other)).all()

NumPy ndarray can be viewed as derived class and used as hashable object. view(ndarray) can be used for back transformation, but it is not even needed in most cases.

NumPy ndarray可以被视为派生类并用作可清除对象。 view(ndarray)可用于反向转换,但在大多数情况下甚至不需要它。

>>> a = array([1,2,3])
>>> b = array([2,3,4])
>>> c = array([1,2,3])
>>> s = set()

>>> s.add(a.view(HashableNdarray))
>>> s.add(b.view(HashableNdarray))
>>> s.add(c.view(HashableNdarray))
>>> print(s)
{HashableNdarray([2, 3, 4]), HashableNdarray([1, 2, 3])}
>>> d = next(iter(s))
>>> print(d == a)
[False False False]
>>> import ctypes
>>> print(d.ctypes.data_as(ctypes.POINTER(ctypes.c_double)))
<__main__.LP_c_double object at 0x7f99f4dbe488>

#1


20  

If you want a set of the elements, here is another, probably faster way:

如果你想要一组元素,这里是另一种,可能更快的方法:

y = set(x.flatten())

PS: after performing comparisons between x.flat, x.flatten(), and x.ravel() on a 10x100 array, I found out that they all perform at about the same speed. For a 3x3 array, the fastest version is the iterator version:

PS:在10x100阵列上执行x.flat,x.flatten()和x.ravel()之间的比较后,我发现它们都以大致相同的速度运行。对于3x3阵列,最快的版本是迭代器版本:

y = set(x.flat)

which I would recommend because it is the less memory expensive version (it scales up well with the size of the array).

我建议这是因为它是内存较少的版本(它可以很好地扩展到数组的大小)。

PS: There is also a NumPy function that does something similar:

PS:还有一个类似的NumPy函数:

y = numpy.unique(x)

This does produce a NumPy array with the same element as set(x.flat), but as a NumPy array. This is very fast (almost 10 times faster), but if you need a set, then doing set(numpy.unique(x)) is a bit slower than the other procedures (building a set comes with a large overhead).

这确实产生了一个NumPy数组,其元素与set(x.flat)相同,但是作为NumPy数组。这非常快(几乎快10倍),但是如果你需要一个set,那么set(numpy.unique(x))比其他程序慢一点(构建一个集合需要很大的开销)。

#2


11  

The immutable counterpart to an array is the tuple, hence, try convert the array of arrays into an array of tuples:

数组的不可变对应元组是元组,因此,尝试将数组数组转换为元组数组:

>> from numpy import *
>> x = array([[3,2,3],[4,4,4]])

>> x_hashable = map(tuple, x)

>> y = set(x_hashable)
set([(3, 2, 3), (4, 4, 4)])

#3


7  

The above answers work if you want to create a set out of the elements contained in an ndarray, but if you want to create a set of ndarray objects – or use ndarray objects as keys in a dictionary – then you'll have to provide a hashable wrapper for them. See the code below for a simple example:

如果你想创建一个包含在ndarray中的元素集,但是如果你想创建一组ndarray对象 - 或者使用ndarray对象作为字典中的键 - 那么你必须提供一个适合他们的可洗包装。有关简单示例,请参阅下面的代码:

from hashlib import sha1

from numpy import all, array, uint8


class hashable(object):
    r'''Hashable wrapper for ndarray objects.

        Instances of ndarray are not hashable, meaning they cannot be added to
        sets, nor used as keys in dictionaries. This is by design - ndarray
        objects are mutable, and therefore cannot reliably implement the
        __hash__() method.

        The hashable class allows a way around this limitation. It implements
        the required methods for hashable objects in terms of an encapsulated
        ndarray object. This can be either a copied instance (which is safer)
        or the original object (which requires the user to be careful enough
        not to modify it).
    '''
    def __init__(self, wrapped, tight=False):
        r'''Creates a new hashable object encapsulating an ndarray.

            wrapped
                The wrapped ndarray.

            tight
                Optional. If True, a copy of the input ndaray is created.
                Defaults to False.
        '''
        self.__tight = tight
        self.__wrapped = array(wrapped) if tight else wrapped
        self.__hash = int(sha1(wrapped.view(uint8)).hexdigest(), 16)

    def __eq__(self, other):
        return all(self.__wrapped == other.__wrapped)

    def __hash__(self):
        return self.__hash

    def unwrap(self):
        r'''Returns the encapsulated ndarray.

            If the wrapper is "tight", a copy of the encapsulated ndarray is
            returned. Otherwise, the encapsulated ndarray itself is returned.
        '''
        if self.__tight:
            return array(self.__wrapped)

        return self.__wrapped

Using the wrapper class is simple enough:

使用包装类很简单:

>>> from numpy import arange

>>> a = arange(0, 1024)
>>> d = {}
>>> d[a] = 'foo'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> b = hashable(a)
>>> d[b] = 'bar'
>>> d[b]
'bar'

#4


3  

If you want a set of the elements:

如果你想要一组元素:

>> y = set(e for r in x
             for e in r)
set([2, 3, 4])

For a set of the rows:

对于一组行:

>> y = set(tuple(r) for r in x)
set([(3, 2, 3), (4, 4, 4)])

#5


0  

I liked xperroni's idea. But I think implementation can be simplified using direct inheritance from ndarray instead of wrapping it.

我喜欢xperroni的想法。但我认为使用ndarray的直接继承而不是包装它可以简化实现。

from hashlib import sha1
from numpy import ndarray, uint8, array

class HashableNdarray(ndarray):
    def __hash__(self):
        if not hasattr(hasattr, '__hash'):
            self.__hash = int(sha1(self.view(uint8)).hexdigest(), 16)
        return self.__hash

    def __eq__(self, other):
        if not isinstance(other, HashableNdarray):
            return super(HashableNdarray, self).__eq__(other)
        return super(HashableNdarray, self).__eq__(super(HashableNdarray, other)).all()

NumPy ndarray can be viewed as derived class and used as hashable object. view(ndarray) can be used for back transformation, but it is not even needed in most cases.

NumPy ndarray可以被视为派生类并用作可清除对象。 view(ndarray)可用于反向转换,但在大多数情况下甚至不需要它。

>>> a = array([1,2,3])
>>> b = array([2,3,4])
>>> c = array([1,2,3])
>>> s = set()

>>> s.add(a.view(HashableNdarray))
>>> s.add(b.view(HashableNdarray))
>>> s.add(c.view(HashableNdarray))
>>> print(s)
{HashableNdarray([2, 3, 4]), HashableNdarray([1, 2, 3])}
>>> d = next(iter(s))
>>> print(d == a)
[False False False]
>>> import ctypes
>>> print(d.ctypes.data_as(ctypes.POINTER(ctypes.c_double)))
<__main__.LP_c_double object at 0x7f99f4dbe488>