如何在2D numpy数组中找到簇大小?

时间:2021-11-30 01:44:18

My problem is the following,


I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example:

我有一个2D numpy数组,填充0和1,具有吸收边界条件(所有外部元素都是0),例如:

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0]
 [0 0 1 0 1 0 0 0 1 0]
 [0 0 0 0 0 0 1 0 1 0]
 [0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 1 0 1 0 0 0]
 [0 0 0 0 0 1 1 0 0 0]
 [0 0 0 1 0 1 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]

I want to create a function that takes this array and its linear dimension L as input parameters, (in this case L = 10) and returns the list of cluster sizes of this array.

我想创建一个函数,将此数组及其线性维L作为输入参数(在本例中为L = 10),并返回此数组的簇大小列表。

By "clusters" I mean the isolated groups of elements 1 of the array


the array element [ i ][ j ] is isolated if all its neighbours are zeros, and its neighbours are the elements:

如果数组元素[i] [j]的所有邻居都是零,则它是孤立的,并且它的邻居是元素:


So in the previous array we have 7 clusters of sizes (2,1,2,6,1,1,1)


I tried to complete this task by creating two functions, the first one is a recursive function:


def clust_size(array,i,j):

    count = 0

    if array[i][j] == 1:

        array[i][j] = 0

        if array[i-1][j] == 1:

            count += 1
            array[i-1][j] = 0

        elif array[i][j-1] == 1:

            count += 1
            array[i-1][j] = 0

        elif array[i+1][j] == 1:

            count += 1
            array[i-1][j] =  0

        elif array[i][j+1] == 1:

            count += 1
            array[i-1][j] = 0

    return count+1         

and it should return the size of one cluster. Everytime the function finds an array element equal to 1 it increases the value of the counter "count" and changes the value of the element to 0, in this way each '1' element it's counted just one time. If one of the neighbours of the element is equal to 1 then the function calls itself on that element.


The second function is:


def clust_list(array,L):

    sizes_list = []

    for i in range(1,L-1):
        for i in range(1,L-1):

           count = clust_size(array,i,j)


    return sizes_list

and it should return the list containing the cluster sizes. The for loop iterates from 1 to L-1 because all the outer elements are 0.

它应该返回包含簇大小的列表。 for循环从1迭代到L-1,因为所有外部元素都是0。

This doesn't work and I can't see where the error is...


I was wondering if maybe there's an easier way to do it.


4 个解决方案



it seems like a percolation problem. The following link has your answer if you have scipy installed.




from pylab import *
from scipy.ndimage import measurements

z2 = array([[0,0,0,0,0,0,0,0,0,0],

This will identify the clusters:


lw, num = measurements.label(z2)
print lw
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
   [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
   [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

The following will calculate their area.


area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
print area
[ 0.  2.  1.  2.  6.  1.  1.  1.]

This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.




I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:




Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.




I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.

我相信你的方式几乎是正确的,除非你在递归调用函数clust_size时反复初始化变量计数。我会将count变量添加到clust_size的输入参数中,并在count = 0的嵌套for循环中为每次第一次调用重新初始化它。

Like this, you would call clust_size always like count=clust_size(array, i ,j, count) I haven't tested it but it seems to me that it should work.

像这样,你会调用clust_size总是像count = clust_size(array,i,j,count)我还没有测试过,但在我看来它应该可行。

Hope it helps.




A relatively simple problem if you convert this to strings


import numpy as np                                       
arr=np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 1, 1, 1, 1, 1, 1, 0,],   #modified        
              [0, 0, 0, 0, 0, 0, 1, 0, 1, 0,],           
              [0, 0, 0, 0, 0, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 1, 1, 0, 0, 0,],           
              [0, 0, 0, 1, 0, 1, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])           

arr = "".join([str(x) for x in arr.reshape(-1)])         
print [len(x) for x in arr.replace("0"," ").split()] 



[1, 7, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1] #Cluster sizes



it seems like a percolation problem. The following link has your answer if you have scipy installed.




from pylab import *
from scipy.ndimage import measurements

z2 = array([[0,0,0,0,0,0,0,0,0,0],

This will identify the clusters:


lw, num = measurements.label(z2)
print lw
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
   [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
   [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

The following will calculate their area.


area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
print area
[ 0.  2.  1.  2.  6.  1.  1.  1.]

This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.




I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:




Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.




I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.

我相信你的方式几乎是正确的,除非你在递归调用函数clust_size时反复初始化变量计数。我会将count变量添加到clust_size的输入参数中,并在count = 0的嵌套for循环中为每次第一次调用重新初始化它。

Like this, you would call clust_size always like count=clust_size(array, i ,j, count) I haven't tested it but it seems to me that it should work.

像这样,你会调用clust_size总是像count = clust_size(array,i,j,count)我还没有测试过,但在我看来它应该可行。

Hope it helps.




A relatively simple problem if you convert this to strings


import numpy as np                                       
arr=np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 1, 1, 1, 1, 1, 1, 0,],   #modified        
              [0, 0, 0, 0, 0, 0, 1, 0, 1, 0,],           
              [0, 0, 0, 0, 0, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 1, 1, 0, 0, 0,],           
              [0, 0, 0, 1, 0, 1, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])           

arr = "".join([str(x) for x in arr.reshape(-1)])         
print [len(x) for x in arr.replace("0"," ").split()] 



[1, 7, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1] #Cluster sizes