如何在2d数组中找到最常见的int？

OK, so I'm just starting to think how to implement a new graphical plugin for Paint.NET and I will need to know how to find the most common integer in a 2d array of integers. Is there a built-in to C# way to do this? Or, does anyone have a slick way to do it?

好的,所以我刚开始考虑如何为Paint.NET实现一个新的图形插件,我需要知道如何在2d整数数组中找到最常见的整数。是否有内置的C#方式来做到这一点?或者,有没有人有一个光滑的方式来做到这一点?

The array will look something like this:

该数组看起来像这样:

300 300 300 300 300 300 300
  0 150 300 300 300 300 300
  0   0 150 300 300 300 300
  0   0   0   0 300 300 300
  0   0   0   0 150 300 300
  0   0   0   0   0 150 300
  0   0   0   0   0   0 300

I would need to know that 300 is the most common number in the array. If there is no "most common" then just return the center number (the array dimintions will always be odd x odd) 0.

我需要知道300是阵列中最常见的数字。如果没有“最常见”,则只返回中心数(数组减少将始终为奇数x奇数)0。

I'll be implementing this using a "brute force" algorithm unless you experts can come up with something faster.

我将使用“强力”算法实现这一点,除非你的专家可以更快地提出一些东西。

Any help would be very much appreciated.

任何帮助将非常感谢。

Thanks!

EDIT: More info...

编辑:更多信息......

The values will almost always be VERY diverse (more diverse than my example array). The values will be in the range of 0-360. The size of the array will be 5x5 to about 17x17 depending on speed of the algorithm. The result will be calculate once for each pixel in a large image... so faster is better. ;)

这些值几乎总是非常多样化(比我的示例数组更加多样化)。值将在0-360的范围内。根据算法的速度,阵列的大小将是5x5到大约17x17。对于大图像中的每个像素,结果将计算一次......因此更快更好。 ;)

8 个解决方案

#1

Take a look at the LocalHistogramEffect code in Paint.NET, notably LocalHistorgramEffect.RenderRect.

看一下Paint.NET中的LocalHistogramEffect代码,特别是LocalHistorgramEffect.RenderRect。

I walks the input image, maintaining a histogram of intensities for each source pixel withing 'r' pixels of the destination pixel. As the output pixels are traversed, it adds the leading edge to the histogram and subtracts the trailing edge. It handles all the edge cases well, and is quite fast. It's the basis for the Median, Unfocus, Outline, and Remove Noise effects.

我走过输入图像,为每个源像素保持一个强度直方图,其中包含目标像素的“r”像素。当遍历输出像素时,它将前沿添加到直方图并减去后沿。它可以很好地处理所有边缘情况,而且速度非常快。它是Median,Unfocus,Outline和Remove Noise效果的基础。

Adapting this to support Hue instead of RGB intensity would be rather trivial.

对此进行调整以支持Hue而不是RGB强度将是相当微不足道的。

The performance is quite good, and for your purposes it operates in O(r^2+wr+nw), where r is the radius, w is the width of the image, and n is the number of levels in the histogram.

性能非常好,为了您的目的,它在O(r ^ 2 + wr + nw)中运行,其中r是半径,w是图像的宽度,n是直方图中的级别数。

-tjackson

#2

It's at least O(n*m) any way you slice it -- you are going to have to look at each cell at least once. The place to economize is in where you accumulate the counts of each value before looking for the most common; if your integers vary over a relatively small range (they are uint16, let's say), then you might be able to simply use a flat array instead of a map.

你切片它至少是O(n * m) - 你将不得不至少看一次每个细胞。节约的地方是在寻找最常见之前积累每个值的计数;如果你的整数在一个相对较小的范围内变化(它们是uint16,让我们说),那么你可以简单地使用平面数组而不是地图。

I guess you could also keep a running count x,y of the current top and second-closest candidate for "most common" and early-out as soon as you've less than (n*m)-(x-y) cells left to look at, since at that point there's no way the runner-up could outpace the top candidate.

我猜你还可以保留当前*和第二个最接近的“最常见”和早期候选人的运行计数x,y,只要你小于(n * m) - (xy)的单元格就离开了看看,因为在那一点上,亚军不可能超过最佳候选人。

Integer ops like this are pretty fast; even for a megapixel image the brute force algorithm should only take a couple milliseconds.

像这样的整数运算速度非常快;即使对于百万像素图像,强力算法也应该只需要几毫秒。

I notice you've edited your original question to say that the pixels value from 0..255 -- in that case, definitely go with a simple flat array; that's small enough to easily fit into the l1 dcache and a lookup in a flat array is trez quick.

我注意到你已经编辑了你的原始问题,说像素值从0..255 - 在这种情况下,肯定是一个简单的平面阵列;它足够小,可以轻松放入l1 dcache中,并且可以快速查找平面阵列中的查找。

[edit] : Dealing with the "no most common number" case is very simple once you've built the histogram array: all have you to do is walk through it to find the "most" and "second most" common numbers; if they're equally frequent, then by definition there is no one most common.

[编辑]:一旦你建立了直方图阵列,处理“没有最常见的数字”的情况就非常简单了:所有你要做的就是通过它来找到“最”和“第二大”的常用数字;如果它们同样频繁,那么根据定义,没有一个最常见的。

const int numLevels = 360; // you said each cell contains a number [0..360)
int levelFrequencyCounts[numLevels]; // assume this has been populated such that levelFrequencyCounts[i] = number of cells containing "i"
int mostCommon = 0, runnerUp = 0;
for (int i = 1 ; i < numLevels ; ++i)
{
  if ( levelFrequencyCounts[i] > levelFrequencyCounts[mostCommon] )
  {
    runnnerUp = mostCommon;
    mostCommon = i;
  }
}

if ( levelFrequencyCounts[mostCommon] != levelFrequencyCounts[runnerUp] )
{
   return mostCommon;
}
else
{
   return CenterOfInputData; // (something like InputData[n/2][m/2])
}

#3

how would I do something like this in C#?

我怎么会在C#中做这样的事情?

Something like this:

像这样的东西:

Dictionary<int, int> d = new Dictionary<int, int>();
foreach (int value in matrix)
{
 if (!d.ContainsKey(value))
  d.Add(value, 1);
 else
  d[value] = d[value] + 1;
}
KeyValuePair<int, int> biggest = null;
foreach (KeyValuePair<int, int> found in d)
{
  if ((biggest == null) || (biggest.Value < found.Value))
    biggest = found;
}

#4

One option is LINQ - a bit inefficient, but OK for non-huge arrays:

一个选项是LINQ - 效率有点低,但非大型数组可以:

    var max = (from cell in data.Cast<int>()
               group cell by cell into grp
               select new { Key = grp.Key, Count = grp.Count() } into agg
               orderby agg.Count descending
               select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

Or with a jagged array:

或者使用锯齿状阵列:

    var max = (from row in data
              from cell in row
              group cell by cell into grp
              select new {Key = grp.Key, Count = grp.Count()} into agg
              orderby agg.Count descending
              select agg).First();
    Console.WriteLine(max.Key + ": " + max.Count);

In reality, I would probably use a dictionary/count. This example without LINQ, just "because":

实际上,我可能会使用字典/计数。这个例子没有LINQ,只是“因为”:

    Dictionary<int, int> counts = new Dictionary<int, int>();
    foreach (int value in data)
    {
        int count;
        counts.TryGetValue(value, out count);
        counts[value] = count + 1;
    }
    int maxCount = -1, maxValue = 0;
    foreach (KeyValuePair<int, int> pair in counts)
    {
        if (pair.Value > maxCount)
        {
            maxCount = pair.Value;
            maxValue = pair.Key;
        }
    }
    Console.WriteLine(maxCount + ": " + maxValue);

#5

Your image:

300+ 300+ 300+ 300 300 300 300
  0+ 150+ 300+ 300 300 300 300
  0+   0+ 150+ 300 300 300 300
  0    0    0    0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Marked (+) numbers are your window. w,h is your window dimensions. Apply bucket sorting (as other people suggested since your value ranges are quite limited). Don't cut your evaluation halfway as Crashworks suggests. Don't throw your result yet. This is the first step.

标记的(+)数字是您的窗口。 w,h是你的窗户尺寸。应用桶分类(正如其他人建议的那样,因为您的值范围非常有限)。不要像Crashworks建议那样削减你的评价。不要扔你的结果。这是第一步。

300- 300- 300- 300 300 300 300
  0. 150. 300. 300 300 300 300
  0.   0. 150. 300 300 300 300
  0+   0+   0+   0 300 300 300
  0    0    0    0 150 300 300
  0    0    0    0   0 150 300
  0    0    0    0   0   0 300

Shift your window. Instead of adding, subtract the buckets in the last row/column you passed and add the new buckets. This way you examine each pixel 2(w+h) times i.e. when it crosses the window boundary, instead of w*h times, i.e. while that pixel is in the window, in a naive implementation.

转移你的窗口。而不是添加,减去您传递的最后一行/列中的存储桶并添加新存储桶。这样,您可以检查每个像素2(w + h)次,即当它穿过窗口边界时,而不是w * h次,即,当该像素在窗口中时,在一个简单的实现中。

In other words, You need to move your window like this:

换句话说,你需要像这样移动你的窗口:

|  ^->|  ^
|  |  |  |
|  |  |  |
V->|  V->|

I assume you are trying to implement a nonlinear convolution filter.

我假设您正在尝试实现非线性卷积滤波器。

Corrections welcome.

#6

If speed is your primary concern, do not use a dictionary. Stick with an array of bytes. Try this:

如果速度是您主要关心的问题,请不要使用字典。坚持使用一个字节数组。试试这个:

// stores hit counts (0-360)
short[] hitCounts = new short[361];

// iterate through 2d array and increment hit counts
for (int i = 0; i < toEvaluate.Length; i++)
{
    for (int j = 0; j < toEvaluate[i].Length; j++)
        hitCounts[toEvaluate[i][j]]++;
}

int greatestHitCount = 0; // the hit count of the current greatest value
int greatest = -1; // the current greatest valeu

// iterate through values (0-360) and evalute hit counts
for (int i = 0; i < hitCounts.Length; i++)
{
    // the hit count of hitCounts[i] is higher than the current greatest hit count value
    if (hitCounts[i] > greatestHitCount)
    {
        greatestHitCount = vals[i]; // store the new hit count
        greatest = i; // store the greatest value
    }
    // there is already a value with the same hit count (which is the greatest)
    else if (hitCounts[i] == greatestHitCount)
        greatest = -1; // there are more than one value, we can't use this if it ends up being the greatest
}

if (greatest >= 0) // no greatest value found
    return greatest;

// figure out the middle x and y value
int x = (toEvaluate.Length - 1) / 2 + 1;
int y = (toEvaluate[x].Length - 1) / 2 + 1;

// return the value at the center of the 2d array as the value
return toEvaluate[x][y];

When speed becomes a concern over readability, you end up with necessarily ugly code. The above could definitely benefit from refactoring (hence overdoing the comments), but it should run fast. If it isn't fast enough, you can gain even more optimizations by moving it to unmanaged code.

当速度成为可读性问题时,最终必然会出现丑陋的代码。以上肯定会受益于重构(因此过度评论),但它应该快速运行。如果速度不够快,可以通过将其移动到非托管代码来获得更多优化。

#7

Michael beat me to the post, but I'd do likewise, something like this:

迈克尔打败了我,但我会这样做,像这样:

        int MaxValueIn2dArray(int[,] matrix)
    {
        var d = new int[360];
        int MaxValue = 0;
        for (int x = 0; x <= matrix.GetUpperBound(0); x++)
        {
            for (int y = 0; y <= matrix.GetUpperBound(1); y++)
            {
                d[matrix[x, y]]++;
            }
        }
        foreach (int value in d)
        {
            if (value > MaxValue) MaxValue = value;
        }
        return MaxValue;
    }

It would need to be optimized for your particular needs.

它需要针对您的特定需求进行优化。

#8

All I'll offer is for any algorithm that checks every cell (which is pretty much what you'd expect to do) do two extra things:

所有我提供的是任何检查每个单元格的算法(这几乎是你期望做的)做两件额外的事情:

1.) Make sure the routine exits when the count for the currently most common value > (M x N / 2). If something has >50% coverage on your grid then it's the most common value, no need to continue. If your routine only needs to be right MOST of the time then you could lower the percentage and treat it as a heuristic. You could even run some analysis that spits out something like if coverage is >37.6% then 99.9% of the time it'll be the most common value and then use that percentage.

1.)确保当前最常见值的计数>(M x N / 2)时,例程退出。如果您的网格覆盖率超过50%,那么这是最常见的值,无需继续。如果你的例行程序只需要大部分时间,那么你可以降低百分比并将其视为启发式。您甚至可以运行一些分析,如果覆盖率> 37.6%,然后99.9%的时间它将是最常见的值,然后使用该百分比。

2.) If there is any way you can determine in which side, corner or general location (outer edges, middle, etc.) the most common values are likely to be, you could then scan in that order which together with optimization 1 above could shave off a lot of your scanning. For instance in your example the top right is heavy on the common value. If this was determinable by some heuristic you could scan from the top right to the bottom left in some fashion. If the pattern of scanning needed is complex, pre-generate it.

2.)如果有任何方法可以确定最常见的值可能在哪一侧,一角或一般位置(外边缘,中间等),则可以按顺序扫描哪个与上面的优化1一起可以减少你的大量扫描。例如,在您的示例中,右上角对公共值很重要。如果这可以通过某种启发式确定,则可以以某种方式从右上角扫描到左下角。如果所需的扫描模式很复杂,则预先生成它。

#1