找到第一个可用名称的高效算法

时间:2022-08-23 11:16:18

I have an array containing names of items. I want to give the user the option to create items without specifying their name, so my program will have to supply a unique default name, like "Item 1".

我有一个包含项目名称的数组。我想让用户选择创建项目而不指定他们的名字,所以我的程序必须提供一个唯一的默认名称,如“Item 1”。

The challenge is that the name has to be unique so i have to check all the array for that default name, and if there is an item with the same name i have to change the name to be "Item 2" and so on until i find an available name.

挑战是名称必须是唯一的,所以我必须检查所有数组的默认名称,如果有一个具有相同名称的项目,我必须将名称更改为“项目2”,依此类推,直到我找到可用的名称。

The obvious solution will be something like that:

显而易见的解决方案是这样的:

String name = "Item ";
for (int i = 0; !isAvailable(name + i) ; i++);

My problem with that algorithm is that it runs at O(N^2).

我的算法问题是它运行在O(N ^ 2)。

I wonder if there is a known (or new) more efficient algorithm to solve this case.

我想知道是否有一种已知的(或新的)更有效的算法来解决这种情况。

In other words my question is this: Is there any algorithm that finds the first greater-than-zero number that dose NOT exist in a given array, and runs at less that N^2?

换句话说,我的问题是:是否有任何算法可以找到在给定数组中不存在的第一个大于零的数字,并且运行的数量少于N ^ 2?

7 个解决方案

#1


4  

You can certainly do it in O(N) time, N being the number of items in your array:

您当然可以在O(N)时间内完成,N是数组中的项目数:

  • One of "Item 1", "Item 2", ... "Item N+1" must be free, so create an array of N+1 flags.
  • “项目1”,“项目2”,......“项目N + 1”之一必须是空闲的,因此创建一个N + 1标志数组。

  • Traverse the items, and for each name if it is of form "Item k" with 0 < k <= N+1, set that flag.
  • 遍历项目,如果每个名称的形式为“项目k”且0

  • Scan the flag array for the first clear flag.
  • 扫描标志数组以获取第一个清除标志。

Additional memory requirement is N+1 bits, which certainly beats any data structure that actually stores all N names.

额外的内存要求是N + 1位,这肯定胜过实际存储所有N个名称的任何数据结构。

#2


3  

Yes, there is.

就在这里。

First sort the array. Then run through it and return the first element whose value is not equal to its index (plus 1). The sort is O(n log n), the final step is O(n), so the entire thing is O(n log n).

首先对数组进行排序。然后运行它并返回其值不等于其索引的第一个元素(加1)。排序是O(n log n),最后一步是O(n),所以整个过程是O(n log n)。

If you put all items into a hash table, you can do it in O(n) at the cost of some space and an additional O(1) step at creation of new items. Since each element needs to be visited, O(n) is clearly optimal.

如果将所有项目放入哈希表中,则可以在O(n)中以某些空间为代价并在创建新项目时执行额外的O(1)步骤。由于需要访问每个元素,因此O(n)显然是最佳的。

I'd be interested to see if there's an O(n) way to do this, without using any "persistent" data structure like the hash table. (And assuming unbounded integers, otherwise a bucket sort could be used as an O(n) sorting algorithm).

我有兴趣看看是否有一种O(n)方法可以做到这一点,而不使用像哈希表这样的任何“持久”数据结构。 (并且假设*整数,否则桶排序可以用作O(n)排序算法)。

#3


1  

Why not just keep track of the maximum number to date, and when you need a new one, increment it?

为什么不跟踪到目前为止的最大数量,何时需要新的数量,增加它?

#4


1  

Insert all of the existing names into a hash table. Repeat your loop, but make isAvailable check the hash table. Assuming a decent hash, it's O(nh) where h is the cost of evaluating the hash.

将所有现有名称插入哈希表。重复你的循环,但使isAvailable检查哈希表。假设一个合适的散列,它是O(nh),其中h是评估散列的成本。

#5


1  

You could try to do the following:

您可以尝试执行以下操作:

first:

  • loop through the list, and get all numbered items, this is complexity N
  • 循环遍历列表,并获取所有编号的项目,这是复杂度N.

  • for every numbered item, put the item in a tree (in C++: std::map), this is complexity log(N)
  • 对于每个编号的项目,将项目放在树中(在C ++:std :: map中),这是复杂性日志(N)

So now you have built up a map with the used numbers, with complexity "N x log(N)"

所以现在你已经建立了一个使用了数字的地图,复杂度为“N x log(N)”

Next, iterate to the tree and as soon you see a 'hole', use the number. Worst case is complexity N.

接下来,迭代到树,一旦看到“洞”,请使用数字。最坏的情况是复杂性N.

So in total, the complexity is N x log(N) + N, or simplified: N log(N).

因此总的来说,复杂度是N x log(N)+ N,或简化为:N log(N)。

#6


0  

If there will be only one user accessing this array, why not use the number of nanoseconds? This, of course, assumes that the user is much slower than a computer, which seems to be a safe assumption.

如果只有一个用户访问此阵列,为什么不使用纳秒数?当然,这假设用户比计算机慢得多,这似乎是一个安全的假设。

That way, you have a O(1) cost for determining a unique name.

这样,您确定唯一名称的成本为O(1)。

#7


0  

A logarithmic-time approach, assuming that you never leave "holes" by deleting items:

假设您永远不会通过删除项目留下“漏洞”,采用对数时间方法:

// Inverse binary search for the last one.
int upperBound = 1;
while(isInUse(upperBound)) upperBound *= 2;

// Standard binary search for the end once we have a ballpark.
int lowerBound = upperBound / 2;
while(lowerBound < upperBound - 1)
{
    int midpoint = (lowerBound + upperBound) / 2;
    if (isInUse(midpoint))
        lowerBound = midpoint;
    else
        upperBound = midpoint;
}
return upperBound;

If item numbers can be freed by deleting, nothing short of linear search will work unless you also keep a "free list" and pick from that.

如果可以通过删除释放项目编号,除非您还保留“免费列表”并从中进行选择,否则任何线性搜索都不会起作用。

#1


4  

You can certainly do it in O(N) time, N being the number of items in your array:

您当然可以在O(N)时间内完成,N是数组中的项目数:

  • One of "Item 1", "Item 2", ... "Item N+1" must be free, so create an array of N+1 flags.
  • “项目1”,“项目2”,......“项目N + 1”之一必须是空闲的,因此创建一个N + 1标志数组。

  • Traverse the items, and for each name if it is of form "Item k" with 0 < k <= N+1, set that flag.
  • 遍历项目,如果每个名称的形式为“项目k”且0

  • Scan the flag array for the first clear flag.
  • 扫描标志数组以获取第一个清除标志。

Additional memory requirement is N+1 bits, which certainly beats any data structure that actually stores all N names.

额外的内存要求是N + 1位,这肯定胜过实际存储所有N个名称的任何数据结构。

#2


3  

Yes, there is.

就在这里。

First sort the array. Then run through it and return the first element whose value is not equal to its index (plus 1). The sort is O(n log n), the final step is O(n), so the entire thing is O(n log n).

首先对数组进行排序。然后运行它并返回其值不等于其索引的第一个元素(加1)。排序是O(n log n),最后一步是O(n),所以整个过程是O(n log n)。

If you put all items into a hash table, you can do it in O(n) at the cost of some space and an additional O(1) step at creation of new items. Since each element needs to be visited, O(n) is clearly optimal.

如果将所有项目放入哈希表中,则可以在O(n)中以某些空间为代价并在创建新项目时执行额外的O(1)步骤。由于需要访问每个元素,因此O(n)显然是最佳的。

I'd be interested to see if there's an O(n) way to do this, without using any "persistent" data structure like the hash table. (And assuming unbounded integers, otherwise a bucket sort could be used as an O(n) sorting algorithm).

我有兴趣看看是否有一种O(n)方法可以做到这一点,而不使用像哈希表这样的任何“持久”数据结构。 (并且假设*整数,否则桶排序可以用作O(n)排序算法)。

#3


1  

Why not just keep track of the maximum number to date, and when you need a new one, increment it?

为什么不跟踪到目前为止的最大数量,何时需要新的数量,增加它?

#4


1  

Insert all of the existing names into a hash table. Repeat your loop, but make isAvailable check the hash table. Assuming a decent hash, it's O(nh) where h is the cost of evaluating the hash.

将所有现有名称插入哈希表。重复你的循环,但使isAvailable检查哈希表。假设一个合适的散列,它是O(nh),其中h是评估散列的成本。

#5


1  

You could try to do the following:

您可以尝试执行以下操作:

first:

  • loop through the list, and get all numbered items, this is complexity N
  • 循环遍历列表,并获取所有编号的项目,这是复杂度N.

  • for every numbered item, put the item in a tree (in C++: std::map), this is complexity log(N)
  • 对于每个编号的项目,将项目放在树中(在C ++:std :: map中),这是复杂性日志(N)

So now you have built up a map with the used numbers, with complexity "N x log(N)"

所以现在你已经建立了一个使用了数字的地图,复杂度为“N x log(N)”

Next, iterate to the tree and as soon you see a 'hole', use the number. Worst case is complexity N.

接下来,迭代到树,一旦看到“洞”,请使用数字。最坏的情况是复杂性N.

So in total, the complexity is N x log(N) + N, or simplified: N log(N).

因此总的来说,复杂度是N x log(N)+ N,或简化为:N log(N)。

#6


0  

If there will be only one user accessing this array, why not use the number of nanoseconds? This, of course, assumes that the user is much slower than a computer, which seems to be a safe assumption.

如果只有一个用户访问此阵列,为什么不使用纳秒数?当然,这假设用户比计算机慢得多,这似乎是一个安全的假设。

That way, you have a O(1) cost for determining a unique name.

这样,您确定唯一名称的成本为O(1)。

#7


0  

A logarithmic-time approach, assuming that you never leave "holes" by deleting items:

假设您永远不会通过删除项目留下“漏洞”,采用对数时间方法:

// Inverse binary search for the last one.
int upperBound = 1;
while(isInUse(upperBound)) upperBound *= 2;

// Standard binary search for the end once we have a ballpark.
int lowerBound = upperBound / 2;
while(lowerBound < upperBound - 1)
{
    int midpoint = (lowerBound + upperBound) / 2;
    if (isInUse(midpoint))
        lowerBound = midpoint;
    else
        upperBound = midpoint;
}
return upperBound;

If item numbers can be freed by deleting, nothing short of linear search will work unless you also keep a "free list" and pick from that.

如果可以通过删除释放项目编号,除非您还保留“免费列表”并从中进行选择,否则任何线性搜索都不会起作用。