如何在2D numpy数组中找到簇大小？

My problem is the following,

我的问题如下，

I have a 2D numpy array filled with 0 an 1, with an absorbing boundary condition (all the outer elements are 0) , for example:

我有一个2D numpy数组，填充0和1，具有吸收边界条件（所有外部元素都是0），例如：

[[0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0]
 [0 0 1 0 1 0 0 0 1 0]
 [0 0 0 0 0 0 1 0 1 0]
 [0 0 0 0 0 0 1 0 0 0]
 [0 0 0 0 1 0 1 0 0 0]
 [0 0 0 0 0 1 1 0 0 0]
 [0 0 0 1 0 1 0 0 0 0]
 [0 0 0 0 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0]]

I want to create a function that takes this array and its linear dimension L as input parameters, (in this case L = 10) and returns the list of cluster sizes of this array.

我想创建一个函数，将此数组及其线性维L作为输入参数（在本例中为L = 10），并返回此数组的簇大小列表。

By "clusters" I mean the isolated groups of elements 1 of the array

“簇”是指阵列中元素1的孤立组

the array element [ i ][ j ] is isolated if all its neighbours are zeros, and its neighbours are the elements:

如果数组元素[i] [j]的所有邻居都是零，则它是孤立的，并且它的邻居是元素：

[i+1][j]
[i-1][j]
[i][j+1]
[i][j-1]

So in the previous array we have 7 clusters of sizes (2,1,2,6,1,1,1)

所以在之前的数组中我们有7个大小的簇（2,1,2,6,1,1,1）

I tried to complete this task by creating two functions, the first one is a recursive function:

我尝试通过创建两个函数来完成此任务，第一个是递归函数：

def clust_size(array,i,j):

    count = 0

    if array[i][j] == 1:

        array[i][j] = 0

        if array[i-1][j] == 1:

            count += 1
            array[i-1][j] = 0
            clust_size(array,i-1,j)

        elif array[i][j-1] == 1:

            count += 1
            array[i-1][j] = 0
            clust_size(array,i,j-1)

        elif array[i+1][j] == 1:

            count += 1
            array[i-1][j] =  0
            clust_size(array,i+1,j)

        elif array[i][j+1] == 1:

            count += 1
            array[i-1][j] = 0
            clust_size(array,i,j+1)

    return count+1

and it should return the size of one cluster. Everytime the function finds an array element equal to 1 it increases the value of the counter "count" and changes the value of the element to 0, in this way each '1' element it's counted just one time. If one of the neighbours of the element is equal to 1 then the function calls itself on that element.

它应该返回一个集群的大小。每次函数找到一个等于1的数组元素时，它会增加计数器“count”的值，并将元素的值更改为0，这样每个“1”元素只计算一次。如果元素的一个邻居等于1，则该函数在该元素上调用自身。

The second function is:

第二个功能是：

def clust_list(array,L):

    sizes_list = []

    for i in range(1,L-1):
        for i in range(1,L-1):

           count = clust_size(array,i,j)

           sizes_list.append(count)

    return sizes_list

and it should return the list containing the cluster sizes. The for loop iterates from 1 to L-1 because all the outer elements are 0.

它应该返回包含簇大小的列表。 for循环从1迭代到L-1，因为所有外部元素都是0。

This doesn't work and I can't see where the error is...

这不起作用，我无法看到错误在哪里...

I was wondering if maybe there's an easier way to do it.

我想知道是否有更简单的方法来做到这一点。

4 个解决方案

#1

it seems like a percolation problem. The following link has your answer if you have scipy installed.

这似乎是一个渗透问题。如果你安装了scipy，以下链接有你的答案。

http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/

from pylab import *
from scipy.ndimage import measurements

z2 = array([[0,0,0,0,0,0,0,0,0,0],
    [0,0,1,0,0,0,0,0,0,0],
    [0,0,1,0,1,0,0,0,1,0],
    [0,0,0,0,0,0,1,0,1,0],
    [0,0,0,0,0,0,1,0,0,0],
    [0,0,0,0,1,0,1,0,0,0],
    [0,0,0,0,0,1,1,0,0,0],
    [0,0,0,1,0,1,0,0,0,0],
    [0,0,0,0,1,0,0,0,0,0],
    [0,0,0,0,0,0,0,0,0,0]])

This will identify the clusters:

这将识别集群：

lw, num = measurements.label(z2)
print lw
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
   [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
   [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
   [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
   [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
   [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

The following will calculate their area.

以下将计算他们的面积。

area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
print area
[ 0.  2.  1.  2.  6.  1.  1.  1.]

This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.

这给出了你的期望，虽然我认为你会有一个由8个成员通过眼睛渗透的集群。

#2

I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:

我觉得你找到“集群”的问题，基本上是基于4连接在二进制映像中找到连接组件（值为0或1）的问题。您可以在Wikipedia页面中看到几种算法来识别连接的组件（或者您定义它们的“簇”）：

http://en.wikipedia.org/wiki/Connected-component_labeling

Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.

标记连接的组件或“群集”后，您可以轻松找到所需的任何信息，包括区域，相对位置或您可能需要的任何其他信息。

#3

I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.

我相信你的方式几乎是正确的，除非你在递归调用函数clust_size时反复初始化变量计数。我会将count变量添加到clust_size的输入参数中，并在count = 0的嵌套for循环中为每次第一次调用重新初始化它。

Like this, you would call clust_size always like count=clust_size(array, i ,j, count) I haven't tested it but it seems to me that it should work.

像这样，你会调用clust_size总是像count = clust_size（array，i，j，count）我还没有测试过，但在我看来它应该可行。

Hope it helps.

希望能帮助到你。

#4

-2

A relatively simple problem if you convert this to strings

如果将其转换为字符串，则是一个相对简单的问题

import numpy as np                                       
arr=np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 0, 0, 0, 0, 0, 0, 0,],           
              [0, 0, 1, 1, 1, 1, 1, 1, 1, 0,],   #modified        
              [0, 0, 0, 0, 0, 0, 1, 0, 1, 0,],           
              [0, 0, 0, 0, 0, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 1, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 1, 1, 0, 0, 0,],           
              [0, 0, 0, 1, 0, 1, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 1, 0, 0, 0, 0, 0,],           
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])           

arr = "".join([str(x) for x in arr.reshape(-1)])         
print [len(x) for x in arr.replace("0"," ").split()]

output

产量

[1, 7, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1] #Cluster sizes

#1