有效地将阈值函数应用于SciPy稀疏csr_matrix

时间:2022-04-28 01:58:58

I have a SciPy csr_matrix (a vector in this case) of 1 column and x rows. In it are float values which I need to convert to the discrete class labels -1, 0 and 1. This should be done with a threshold function which maps the float values to one of these 3 class labels.

我有一个1列和x行的SciPy csr_matrix(在这种情况下是一个向量)。其中是浮点值,我需要将其转换为离散类标签-1,0和1.这应该使用阈值函数来完成,该函数将浮点值映射到这3个类标签中的一个。

Is there no way other than iterating over the elements as described in Iterating through a scipy.sparse vector (or matrix)? I would love to have some elegant way to just somehow map(thresholdfunc()) on all elements.

除了迭代通过scipy.sparse向量(或矩阵)中所描述的元素之外,没有办法吗?我希望有一些优雅的方式以某种方式映射(thresholdfunc())所有元素。

Note that while it is of type csr_matrix, it isn't actually sparse as it's just the return of another function where a sparse matrix was involved.

请注意,虽然它的类型为csr_matrix,但它实际上并不稀疏,因为它只是涉及稀疏矩阵的另一个函数的返回。

1 个解决方案

#1


1  

If you have an array, you can discretize based on some condition with the np.where function. e.g.:

如果你有一个数组,你可以使用np.where函数根据某些条件进行离散化。例如。:

>>> import numpy as np
>>> x = np.arange(10)
>>> np.where(x < 5, 0, 1)
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

The syntax is np.where(BOOLEAN_ARRAY, VALUE_IF_TRUE, VALUE_IF_FALSE). You can chain together two where statements to have multiple conditions:

语法是np.where(BOOLEAN_ARRAY,VALUE_IF_TRUE,VALUE_IF_FALSE)。您可以将两个where语句链接在一起以具有多个条件:

>>> np.where(x < 3, -1, np.where(x > 6, 0, 1))
array([-1, -1, -1,  1,  1,  1,  1,  0,  0,  0])

To apply this to your data in the CSR or CSC sparse matrix, you can use the .data attribute, which gives you access to the internal array containing all the nonzero entries in the sparse matrix. For example:

要将其应用于CSR或CSC稀疏矩阵中的数据,可以使用.data属性,该属性允许您访问包含稀疏矩阵中所有非零条目的内部数组。例如:

>>> from scipy import sparse
>>> mat = sparse.csr_matrix(x.reshape(10, 1))
>>> mat.data = np.where(mat.data < 3, -1, np.where(mat.data > 6, 0, 1))
>>> mat.toarray()
array([[ 0],
       [-1],
       [-1],
       [ 1],
       [ 1],
       [ 1],
       [ 1],
       [ 0],
       [ 0],
       [ 0]])

#1


1  

If you have an array, you can discretize based on some condition with the np.where function. e.g.:

如果你有一个数组,你可以使用np.where函数根据某些条件进行离散化。例如。:

>>> import numpy as np
>>> x = np.arange(10)
>>> np.where(x < 5, 0, 1)
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

The syntax is np.where(BOOLEAN_ARRAY, VALUE_IF_TRUE, VALUE_IF_FALSE). You can chain together two where statements to have multiple conditions:

语法是np.where(BOOLEAN_ARRAY,VALUE_IF_TRUE,VALUE_IF_FALSE)。您可以将两个where语句链接在一起以具有多个条件:

>>> np.where(x < 3, -1, np.where(x > 6, 0, 1))
array([-1, -1, -1,  1,  1,  1,  1,  0,  0,  0])

To apply this to your data in the CSR or CSC sparse matrix, you can use the .data attribute, which gives you access to the internal array containing all the nonzero entries in the sparse matrix. For example:

要将其应用于CSR或CSC稀疏矩阵中的数据,可以使用.data属性,该属性允许您访问包含稀疏矩阵中所有非零条目的内部数组。例如:

>>> from scipy import sparse
>>> mat = sparse.csr_matrix(x.reshape(10, 1))
>>> mat.data = np.where(mat.data < 3, -1, np.where(mat.data > 6, 0, 1))
>>> mat.toarray()
array([[ 0],
       [-1],
       [-1],
       [ 1],
       [ 1],
       [ 1],
       [ 1],
       [ 0],
       [ 0],
       [ 0]])