用最后的非零值填充1d numpy数组的零值

时间:2022-12-05 21:25:54

Let's say we have a 1d numpy array filled with some int values. And let's say that some of them are 0.

假设我们有一个1d numpy数组,其中包含一些int值。假设有些是0。

Is there any way, using numpy array's power, to fill all the 0 values with the last non-zero values found?

有没有办法,利用numpy数组的幂,用最后找到的非零值来填充所有的0值?

for example:

例如:

arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
fill_zeros_with_last(arr)
print arr

[1 1 1 2 2 4 6 8 8 8 8 8 2]

A way to do it would be with this function:

一种方法是用这个函数

def fill_zeros_with_last(arr):
    last_val = None # I don't really care about the initial value
    for i in range(arr.size):
        if arr[i]:
            last_val = arr[i]
        elif last_val is not None:
            arr[i] = last_val

However, this is using a raw python for loop instead of taking advantage of the numpy and scipy power.

然而,这是使用一个原始的python for循环,而不是利用numpy和scipy的功能。

If we knew that a reasonably small number of consecutive zeros are possible, we could use something based on numpy.roll. The problem is that the number of consecutive zeros is potentially large...

如果我们知道有相当少的连续0是可能的,我们可以使用基于numpy.roll的东西。问题是,连续的零的数目可能很大……

Any ideas? or should we go straight to Cython?

什么好主意吗?还是直接去Cython?

Disclaimer:

I would say long ago I found a question in * asking something like this or very similar. I wasn't able to find it. :-(

我想说很久以前我在*上发现了一个问题问类似这样的问题。我找不到它。:-(

Maybe I missed the right search terms, sorry for the duplicate then. Maybe it was just my imagination...

也许我错过了正确的搜索词,抱歉,是重复的。也许只是我的想象……

3 个解决方案

#1


15  

Here's a solution using np.maximum.accumulate:

这是一个使用np极大值的解决方案。

def fill_zeros_with_last(arr):
    prev = np.arange(len(arr))
    prev[arr == 0] = 0
    prev = np.maximum.accumulate(prev)
    return arr[prev]

We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:

我们构造了一个与arr具有相同长度的数组prev,这样prev[i]就是arr第i项之前最后一个非零项的索引。例如,如果:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

Then prev looks like:

然后上一页的样子:

array([ 0,  0,  0,  3,  3,  5,  6,  7,  7,  7,  7,  7, 12])

Then we just index into arr with prev and we obtain our result. A test:

然后我们用prev索引arr,得到结果。一个测试:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

Note: Be careful to understand what this does when the first entry of your array is zero:

注意:当数组的第一个条目为零时,要小心地理解它的作用:

>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])

#2


4  

Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:

受jme的回答和Bas Swinckels(在链接问题中)的启发,我提出了一种不同的numpy函数组合:

def fill_zeros_with_last(arr, initial=0):
     ind = np.nonzero(arr)[0]
     cnt = np.cumsum(np.array(arr, dtype=bool))
     return np.where(cnt, arr[ind[cnt-1]], initial)

I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)

我认为它很简洁,也很有用,所以我把它贴在这里作为记录。尽管如此,jme的设计简洁易懂,而且看起来更快,所以我接受它:-)

#3


1  

If the 0s only come in strings of 1, this use of nonzero might work:

如果0只在1的字符串中出现,那么非0的使用可能是有效的:

In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])

I can handle your arr by applying this repeatedly until I is empty.

我可以反复地应用你的arr,直到我是空的。

In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

In [287]: while True:
   .....:     I=np.nonzero(arr==0)[0]
   .....:     if len(I)==0: break
   .....:     arr[I] = arr[I-1]
   .....:     

In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.

如果0的字符串很长,那么最好是查找这些字符串并将它们作为块来处理。但如果大多数字符串都很短,那么重复应用程序可能是最快的路径。

#1


15  

Here's a solution using np.maximum.accumulate:

这是一个使用np极大值的解决方案。

def fill_zeros_with_last(arr):
    prev = np.arange(len(arr))
    prev[arr == 0] = 0
    prev = np.maximum.accumulate(prev)
    return arr[prev]

We construct an array prev which has the same length as arr, and such that prev[i] is the index of the last non-zero entry before the i-th entry of arr. For example, if:

我们构造了一个与arr具有相同长度的数组prev,这样prev[i]就是arr第i项之前最后一个非零项的索引。例如,如果:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

Then prev looks like:

然后上一页的样子:

array([ 0,  0,  0,  3,  3,  5,  6,  7,  7,  7,  7,  7, 12])

Then we just index into arr with prev and we obtain our result. A test:

然后我们用prev索引arr,得到结果。一个测试:

>>> arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])
>>> fill_zeros_with_last(arr)
array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

Note: Be careful to understand what this does when the first entry of your array is zero:

注意:当数组的第一个条目为零时,要小心地理解它的作用:

>>> fill_zeros_with_last(np.array([0,0,1,0,0]))
array([0, 0, 1, 1, 1])

#2


4  

Inspired by jme's answer here and by Bas Swinckels' (in the linked question) I came up with a different combination of numpy functions:

受jme的回答和Bas Swinckels(在链接问题中)的启发,我提出了一种不同的numpy函数组合:

def fill_zeros_with_last(arr, initial=0):
     ind = np.nonzero(arr)[0]
     cnt = np.cumsum(np.array(arr, dtype=bool))
     return np.where(cnt, arr[ind[cnt-1]], initial)

I think it's succinct and also works, so I'm posting it here for the record. Still, jme's is also succinct and easy to follow and seems to be faster, so I'm accepting it :-)

我认为它很简洁,也很有用,所以我把它贴在这里作为记录。尽管如此,jme的设计简洁易懂,而且看起来更快,所以我接受它:-)

#3


1  

If the 0s only come in strings of 1, this use of nonzero might work:

如果0只在1的字符串中出现,那么非0的使用可能是有效的:

In [266]: arr=np.array([1,0,2,3,0,4,0,5])
In [267]: I=np.nonzero(arr==0)[0]
In [268]: arr[I] = arr[I-1]
In [269]: arr
Out[269]: array([1, 1, 2, 3, 3, 4, 4, 5])

I can handle your arr by applying this repeatedly until I is empty.

我可以反复地应用你的arr,直到我是空的。

In [286]: arr = np.array([1, 0, 0, 2, 0, 4, 6, 8, 0, 0, 0, 0, 2])

In [287]: while True:
   .....:     I=np.nonzero(arr==0)[0]
   .....:     if len(I)==0: break
   .....:     arr[I] = arr[I-1]
   .....:     

In [288]: arr
Out[288]: array([1, 1, 1, 2, 2, 4, 6, 8, 8, 8, 8, 8, 2])

If the strings of 0s are long it might be better to look for those strings and handle them as a block. But if most strings are short, this repeated application may be the fastest route.

如果0的字符串很长,那么最好是查找这些字符串并将它们作为块来处理。但如果大多数字符串都很短,那么重复应用程序可能是最快的路径。