合并两组不均匀的项目

时间:2021-05-19 00:45:58

I'm trying to get a fairly even distribution of one set of items into another and am looking for an algorithm that can help.

我正试图将一组项目分配到另一组,并且正在寻找可以提供帮助的算法。

For example, Group A has 42 items and Group B has 16 items. I want to mix both groups together so that B is fairly evenly distributed within A. So, the merged group looks something like: {AA B AAA B AA B AA B AAA.....} It would be easy, of course, if A was a multiple of B, but that is not often the case for my needs.

例如,A组有42个项目,B组有16个项目。我想将两个组混合在一起,以便B在A中相当均匀地分布。因此,合并后的组看起来像:{AA B AAA B AA B AA B AAA .....}当然,这很容易如果A是B的倍数,但我的需求通常不是这样。

3 个解决方案

#1


You could start by obtaining the number of items from one set between the items of the other set:

您可以从另一组的项目之间获取一组中的项目数量开始:

float number_between = bigger_set.size() / smaller_set.size();

The iterate over the bigger set, subtracting 1 for each loop from an accumulator (initialized with number_between), inserting an item from the smaller set whenever this accumulator gets below 0, and refreshing it with number_between:

迭代越大的集合,从累加器减去1(用number_between初始化),每当这个累加器低于0时从较小的集合插入一个项目,并用number_between刷新它:

float accumulator = number_between;
foreach(item : bigger_set) {
  result.add(item);
  accumulator = accumulator - 1;
  if (accumulator < 0) {
      result.add(next from smaller_set);
      accumulator = accumulator + number_between;
  } 
} 

EDIT

Change to:

float number_between = (bigger_set.size() +1) / smaller_set.size();

If you want to be sure that the bigger list both starts and ends the result list.

如果您想确保较大的列表同时开始和结束结果列表。

EDIT 2

Beware that using floating point arithmetic may introduce rounding and underflow errors.

请注意,使用浮点运算可能会引入舍入和下溢错误。

For example, if you're using IEEE single precision (mantissa with 24 bit ~ 7 decimal digits) and the bigger list is greater than the smaller list by a factor of 10^7 or more, the line accumulator = accumulator - 1 will underflow (and you'll get a result entirely made by the bigger set and none of the smaller set).

例如,如果您使用的是IEEE单精度(尾数为24位~7位十进制数字),并且较大的列表比较小的列表大10 ^ 7或更多,则行累加器=累加器 - 1将下溢(你会得到一个完全由较大的集合而不是较小的集合产生的结果)。

Also, rounding may lead to an attempt to draw further items from the smaller list when it is exhausted.

此外,舍入可能导致尝试在耗尽时从较小列表中抽取更多项目。

#2


Well, I've been playing around with this and have come up with a solution that will work for my purposes. I essentially mix the larger items into the smaller items and loop back through until I run out of larger items.

好吧,我一直在玩这个,并提出了一个适合我的目的的解决方案。我基本上将较大的项目混合到较小的项目中并循环回来,直到我用完较大的项目。

For Each item In smallerList
  mergedList.add(smallerID)
Next

itemsRemaining = biggerList.Count

While itemsRemaining > 0
  index = 0

  For i = 1 To smallerList.Count
    If index >= mergedList.Count or itemsRemaining = 0 Then Continue While

    mergedList.Insert(index , largerID)
    index += 2 + loopCount
    itemsRemaining -= 1
  Next

  loopCount += 1
End While

Then I can replace the IDs with the actual items from the two lists.

然后我可以用两个列表中的实际项替换ID。

So, for my original example (Group 1 with 42 items and Group 2 with 16), I end up with:

因此,对于我的原始示例(第1组包含42个项目,第2组包含16个项目),我最终得到:

111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 11 2 11 2 11 2 11 2 11 2 11 2

111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 11 2 11 2 11 2 11 2 11 2 11 2

It's a bit front loaded, but for my purposes, this will work out just fine.

这有点前端,但为了我的目的,这将很好。

#3


1) You could concatenate the two groups and do simple sampling from the combined group, for instance by shuffling the elements and iterating through the shuffled combined set.

1)您可以连接两个组并从组合组中进行简单采样,例如通过对元素进行混洗并迭代混合组合集。

2) If you'd rather do it sequentialy you could sample from each group with probabilities size(A) / (size(A) + size(B)) and size(B) / (size(A) + size(B)), where size(A) and size(B) are the current numbers of elements in groups A and B, respectively, that haven't yet been sampled. In other words, if U is a draw from a Uniform(0,1) random number generator:

2)如果您更愿意按顺序进行,您可以从每组中采样概率大小(A)/(大小(A)+大小(B))和大小(B)/(大小(A)+大小(B) ),其中尺寸(A)和尺寸(B)分别是尚未采样的A组和B组中的当前元素数。换句话说,如果U是来自Uniform(0,1)随机数生成器的绘制:

if U <= size(A) / (size(A) + size(B))
   randomly draw next observation from A
else
   randomly draw next observation from B

In both approaches the A's and the B's end up uniformly distributed across the range, which is the statistical description of a "fairly even distribution".

在这两种方法中,A和B最终均匀分布在整个范围内,这是“相当均匀分布”的统计描述。

You didn't specify a language, so here are concrete implementations of both approaches in Ruby. I've cut the set sizes in half to keep the output length reasonable, and obviously these will both produce different results each time they're run due to the use of randomness.

您没有指定语言,因此这里是Ruby中两种方法的具体实现。我将设置的大小减半,以保持输出长度合理,显然,由于使用随机性,每次运行时都会产生不同的结果。

First approach:

a = ['A'] * 21
b = ['B'] * 8
c = (a + b).shuffle
puts c.join(',')

which, for example, produced the following output:

例如,它产生以下输出:

A,A,A,A,A,B,A,A,A,A,A,B,B,B,A,B,A,A,A,A,A,A,A,A,A,B,B,A,B

Second approach:

a = ['A'] * 21
b = ['B'] * 8
c = []    
while a.length > 0 || b.length > 0
  c << (rand <= (a.length / (a.length + b.length).to_f) ? a.shift : b.shift)
end    
puts c.join(',')

which, for example, produced the following output:

例如,它产生以下输出:

A,A,B,A,A,A,B,B,A,A,A,A,A,A,A,B,B,B,A,A,A,B,A,A,A,B,A,A,A

#1


You could start by obtaining the number of items from one set between the items of the other set:

您可以从另一组的项目之间获取一组中的项目数量开始:

float number_between = bigger_set.size() / smaller_set.size();

The iterate over the bigger set, subtracting 1 for each loop from an accumulator (initialized with number_between), inserting an item from the smaller set whenever this accumulator gets below 0, and refreshing it with number_between:

迭代越大的集合,从累加器减去1(用number_between初始化),每当这个累加器低于0时从较小的集合插入一个项目,并用number_between刷新它:

float accumulator = number_between;
foreach(item : bigger_set) {
  result.add(item);
  accumulator = accumulator - 1;
  if (accumulator < 0) {
      result.add(next from smaller_set);
      accumulator = accumulator + number_between;
  } 
} 

EDIT

Change to:

float number_between = (bigger_set.size() +1) / smaller_set.size();

If you want to be sure that the bigger list both starts and ends the result list.

如果您想确保较大的列表同时开始和结束结果列表。

EDIT 2

Beware that using floating point arithmetic may introduce rounding and underflow errors.

请注意,使用浮点运算可能会引入舍入和下溢错误。

For example, if you're using IEEE single precision (mantissa with 24 bit ~ 7 decimal digits) and the bigger list is greater than the smaller list by a factor of 10^7 or more, the line accumulator = accumulator - 1 will underflow (and you'll get a result entirely made by the bigger set and none of the smaller set).

例如,如果您使用的是IEEE单精度(尾数为24位~7位十进制数字),并且较大的列表比较小的列表大10 ^ 7或更多,则行累加器=累加器 - 1将下溢(你会得到一个完全由较大的集合而不是较小的集合产生的结果)。

Also, rounding may lead to an attempt to draw further items from the smaller list when it is exhausted.

此外,舍入可能导致尝试在耗尽时从较小列表中抽取更多项目。

#2


Well, I've been playing around with this and have come up with a solution that will work for my purposes. I essentially mix the larger items into the smaller items and loop back through until I run out of larger items.

好吧,我一直在玩这个,并提出了一个适合我的目的的解决方案。我基本上将较大的项目混合到较小的项目中并循环回来,直到我用完较大的项目。

For Each item In smallerList
  mergedList.add(smallerID)
Next

itemsRemaining = biggerList.Count

While itemsRemaining > 0
  index = 0

  For i = 1 To smallerList.Count
    If index >= mergedList.Count or itemsRemaining = 0 Then Continue While

    mergedList.Insert(index , largerID)
    index += 2 + loopCount
    itemsRemaining -= 1
  Next

  loopCount += 1
End While

Then I can replace the IDs with the actual items from the two lists.

然后我可以用两个列表中的实际项替换ID。

So, for my original example (Group 1 with 42 items and Group 2 with 16), I end up with:

因此,对于我的原始示例(第1组包含42个项目,第2组包含16个项目),我最终得到:

111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 11 2 11 2 11 2 11 2 11 2 11 2

111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 111 2 11 2 11 2 11 2 11 2 11 2 11 2

It's a bit front loaded, but for my purposes, this will work out just fine.

这有点前端,但为了我的目的,这将很好。

#3


1) You could concatenate the two groups and do simple sampling from the combined group, for instance by shuffling the elements and iterating through the shuffled combined set.

1)您可以连接两个组并从组合组中进行简单采样,例如通过对元素进行混洗并迭代混合组合集。

2) If you'd rather do it sequentialy you could sample from each group with probabilities size(A) / (size(A) + size(B)) and size(B) / (size(A) + size(B)), where size(A) and size(B) are the current numbers of elements in groups A and B, respectively, that haven't yet been sampled. In other words, if U is a draw from a Uniform(0,1) random number generator:

2)如果您更愿意按顺序进行,您可以从每组中采样概率大小(A)/(大小(A)+大小(B))和大小(B)/(大小(A)+大小(B) ),其中尺寸(A)和尺寸(B)分别是尚未采样的A组和B组中的当前元素数。换句话说,如果U是来自Uniform(0,1)随机数生成器的绘制:

if U <= size(A) / (size(A) + size(B))
   randomly draw next observation from A
else
   randomly draw next observation from B

In both approaches the A's and the B's end up uniformly distributed across the range, which is the statistical description of a "fairly even distribution".

在这两种方法中,A和B最终均匀分布在整个范围内,这是“相当均匀分布”的统计描述。

You didn't specify a language, so here are concrete implementations of both approaches in Ruby. I've cut the set sizes in half to keep the output length reasonable, and obviously these will both produce different results each time they're run due to the use of randomness.

您没有指定语言,因此这里是Ruby中两种方法的具体实现。我将设置的大小减半,以保持输出长度合理,显然,由于使用随机性,每次运行时都会产生不同的结果。

First approach:

a = ['A'] * 21
b = ['B'] * 8
c = (a + b).shuffle
puts c.join(',')

which, for example, produced the following output:

例如,它产生以下输出:

A,A,A,A,A,B,A,A,A,A,A,B,B,B,A,B,A,A,A,A,A,A,A,A,A,B,B,A,B

Second approach:

a = ['A'] * 21
b = ['B'] * 8
c = []    
while a.length > 0 || b.length > 0
  c << (rand <= (a.length / (a.length + b.length).to_f) ? a.shift : b.shift)
end    
puts c.join(',')

which, for example, produced the following output:

例如,它产生以下输出:

A,A,B,A,A,A,B,B,A,A,A,A,A,A,A,B,B,B,A,A,A,B,A,A,A,B,A,A,A