如何使用矢量化从数组中最接近值的数组中选择值?

时间:2023-01-28 12:32:13

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.

我有一个值数组,我想从一个选择数组替换,基于哪个选项线性最接近。

The catch is the size of the choices is defined at runtime.

catch是在运行时定义的选项的大小。

import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])

If choices was static in size, I would simply use np.where

如果选择的大小是静态的,我只需使用np.where

d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]), 
      np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
         np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))

To get the output:

要获得输出:

>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]

Is there a way to do this more dynamically while still preserving the vectorization.

有没有办法在保留矢量化的同时更加动态地执行此操作。

3 个解决方案

#1


3  

Subtract choices from a, find the index of the minimum of the result, substitute.

从a中减去选项,找到结果最小值的索引,替换。

a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>> 
[[ 1  1  1]
 [ 5  5  5]
 [10 10 10]]

a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>>    
[[ 1  1  1]
 [ 5 10  5]
 [10  1 10]]
>>> 

The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.

将额外维度添加到a中,以便从a的每个元素中减去每个选项元素。选项是在第三维中针对a广播的,这个链接有一个像样的图形。 b.shape是(3,3,3)。 EricsBroadcastingDoc是一个非常好的解释,并在最后有一个图形3-d示例。

For the second example:

对于第二个例子:

>>> print b
[[[ 1  5 10]
  [ 2  2  7]
  [ 1  5 10]]

 [[ 3  1  6]
  [ 7  3  2]
  [ 3  1  6]]

 [[ 8  4  1]
  [ 0  4  9]
  [ 8  4  1]]]
>>> print i
[[0 0 0]
 [1 2 1]
 [2 0 2]]
>>> 

The final assignment uses an Index Array or Integer Array Indexing.

最终赋值使用索引数组或整数数组索引。

In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

在第二个例子中,注意元素a [0,1]存在一个平局,可以替换一个或五个。

#2


2  

To explain wwii's excellent answer in a little more detail:

更详细地解释一下wwii的优秀答案:

The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:

我们的想法是创建一个新的维度,它使用numpy广播来完成将a的每个元素与选择中的每个元素进行比较的工作。使用省略号语法可以轻松完成任意数量的维度:

>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1,  5, 10],
        [ 1,  5, 10],
        [ 1,  5, 10]],
       [[ 3,  1,  6],
        [ 3,  1,  6],
        [ 3,  1,  6]],
       [[ 8,  4,  1],
        [ 8,  4,  1],
        [ 8,  4,  1]]])

Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:

沿您刚刚创建的轴(最后一个轴,标签为-1)获取argmin,为您提供想要替换的选项中所需的索引:

>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Which finally allows you to choose those elements from choices:

最后,您可以从选择中选择这些元素:

>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]])

For a non-symmetric shape:

对于非对称形状:

Let's say a had shape (2, 5):

让我们说一个有形状(2,5):

>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Then you'd get:

然后你会得到:

>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1,  5, 10],
        [ 0,  4,  9],
        [ 1,  3,  8],
        [ 2,  2,  7],
        [ 3,  1,  6]],

       [[ 4,  0,  5],
        [ 5,  1,  4],
        [ 6,  2,  3],
        [ 7,  3,  2],
        [ 8,  4,  1]]])

This is hard to read, but what it's saying is, b has shape:

这很难读,但它说的是,b有形:

>>> b.shape
(2, 5, 3)

The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:

前两个维度来自a的形状,也是(2,5)。最后一个维度是您刚刚创建的维度。为了更好的主意:

>>> b[:, :, 0]  # = abs(a - 1)
array([[1, 0, 1, 2, 3],
       [4, 5, 6, 7, 8]])
>>> b[:, :, 1]  # = abs(a - 5)
array([[5, 4, 3, 2, 1],
       [0, 1, 2, 3, 4]])
>>> b[:, :, 2]  # = abs(a - 10)
array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.

注意b [:,:,i]是a和choice [i]之间的绝对差异,对于每个i = 1,2,3。

Hope that helps explain this a little more clearly.

希望有助于更清楚地解释这一点。

#3


2  

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -

我喜欢广播,也会自己走那条路。但是,对于大型数组,我想建议使用np.searchsorted的另一种方法,以保持内存效率,从而实现性能优势,如此 -

def searchsorted_app(a, choices):
    lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
    ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
    cl = np.take(choices,lidx) # Or choices[lidx]
    cr = np.take(choices,ridx) # Or choices[ridx]
    mask = np.abs(a - cl) > np.abs(a - cr)
    cl[mask] = cr[mask]
    return cl

Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.

请注意,如果选项中的元素未排序,我们需要在np.searchsorted中添加附加参数排序器。

Runtime test -

运行时测试 -

In [160]: # Setup inputs
     ...: a = np.random.rand(100,100)
     ...: choices = np.sort(np.random.rand(100))
     ...: 

In [161]: def broadcasting_app(a, choices): # @wwii's solution
     ...:     return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
     ...: 

In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True

In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop

In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop

Related post : Find elements of array one nearest to elements of array two

相关文章:查找最接近数组2元素的数组1的元素

#1


3  

Subtract choices from a, find the index of the minimum of the result, substitute.

从a中减去选项,找到结果最小值的索引,替换。

a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>> 
[[ 1  1  1]
 [ 5  5  5]
 [10 10 10]]

a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a

>>>    
[[ 1  1  1]
 [ 5 10  5]
 [10  1 10]]
>>> 

The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.

将额外维度添加到a中,以便从a的每个元素中减去每个选项元素。选项是在第三维中针对a广播的,这个链接有一个像样的图形。 b.shape是(3,3,3)。 EricsBroadcastingDoc是一个非常好的解释,并在最后有一个图形3-d示例。

For the second example:

对于第二个例子:

>>> print b
[[[ 1  5 10]
  [ 2  2  7]
  [ 1  5 10]]

 [[ 3  1  6]
  [ 7  3  2]
  [ 3  1  6]]

 [[ 8  4  1]
  [ 0  4  9]
  [ 8  4  1]]]
>>> print i
[[0 0 0]
 [1 2 1]
 [2 0 2]]
>>> 

The final assignment uses an Index Array or Integer Array Indexing.

最终赋值使用索引数组或整数数组索引。

In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

在第二个例子中,注意元素a [0,1]存在一个平局,可以替换一个或五个。

#2


2  

To explain wwii's excellent answer in a little more detail:

更详细地解释一下wwii的优秀答案:

The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:

我们的想法是创建一个新的维度,它使用numpy广播来完成将a的每个元素与选择中的每个元素进行比较的工作。使用省略号语法可以轻松完成任意数量的维度:

>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1,  5, 10],
        [ 1,  5, 10],
        [ 1,  5, 10]],
       [[ 3,  1,  6],
        [ 3,  1,  6],
        [ 3,  1,  6]],
       [[ 8,  4,  1],
        [ 8,  4,  1],
        [ 8,  4,  1]]])

Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:

沿您刚刚创建的轴(最后一个轴,标签为-1)获取argmin,为您提供想要替换的选项中所需的索引:

>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

Which finally allows you to choose those elements from choices:

最后,您可以从选择中选择这些元素:

>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1,  1,  1],
       [ 5,  5,  5],
       [10, 10, 10]])

For a non-symmetric shape:

对于非对称形状:

Let's say a had shape (2, 5):

让我们说一个有形状(2,5):

>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Then you'd get:

然后你会得到:

>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1,  5, 10],
        [ 0,  4,  9],
        [ 1,  3,  8],
        [ 2,  2,  7],
        [ 3,  1,  6]],

       [[ 4,  0,  5],
        [ 5,  1,  4],
        [ 6,  2,  3],
        [ 7,  3,  2],
        [ 8,  4,  1]]])

This is hard to read, but what it's saying is, b has shape:

这很难读,但它说的是,b有形:

>>> b.shape
(2, 5, 3)

The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:

前两个维度来自a的形状,也是(2,5)。最后一个维度是您刚刚创建的维度。为了更好的主意:

>>> b[:, :, 0]  # = abs(a - 1)
array([[1, 0, 1, 2, 3],
       [4, 5, 6, 7, 8]])
>>> b[:, :, 1]  # = abs(a - 5)
array([[5, 4, 3, 2, 1],
       [0, 1, 2, 3, 4]])
>>> b[:, :, 2]  # = abs(a - 10)
array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.

注意b [:,:,i]是a和choice [i]之间的绝对差异,对于每个i = 1,2,3。

Hope that helps explain this a little more clearly.

希望有助于更清楚地解释这一点。

#3


2  

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -

我喜欢广播,也会自己走那条路。但是,对于大型数组,我想建议使用np.searchsorted的另一种方法,以保持内存效率,从而实现性能优势,如此 -

def searchsorted_app(a, choices):
    lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
    ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
    cl = np.take(choices,lidx) # Or choices[lidx]
    cr = np.take(choices,ridx) # Or choices[ridx]
    mask = np.abs(a - cl) > np.abs(a - cr)
    cl[mask] = cr[mask]
    return cl

Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.

请注意,如果选项中的元素未排序,我们需要在np.searchsorted中添加附加参数排序器。

Runtime test -

运行时测试 -

In [160]: # Setup inputs
     ...: a = np.random.rand(100,100)
     ...: choices = np.sort(np.random.rand(100))
     ...: 

In [161]: def broadcasting_app(a, choices): # @wwii's solution
     ...:     return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
     ...: 

In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True

In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop

In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop

Related post : Find elements of array one nearest to elements of array two

相关文章:查找最接近数组2元素的数组1的元素