Python二进制搜索类函数,用于查找排序列表中大于特定值的第一个数字

时间:2023-01-15 10:19:07

I'm trying to write a function in Python that finds the first number in a sorted list greater than a specific value that I pass in as an argument. I've found examples online that use simple list comprehensions to achieve this, but for my purposes I need to be performing this operation frequently and on large lists, so a search that runs in linear time is too expensive.

我正在尝试用Python编写一个函数,它找到排序列表中的第一个数字,该数字大于我作为参数传递的特定值。我在网上找到了使用简单列表推导来实现这一目的的例子,但出于我的目的,我需要经常在大型列表上执行此操作,因此在线性时间内运行的搜索过于昂贵。

I've had a crack at writing an iterative binary search-like function to achieve this, though I'm coming across some edge cases where it doesn't work correctly. By the way, the function is not required to deal with a case where there is no larger item in the list. Here is my existing function:

虽然我遇到了一些无法正常工作的边缘情况,但我在编写迭代二进制搜索类函数时遇到了麻烦。顺便说一下,该功能不需要处理列表中没有较大项目的情况。这是我现有的功能:

def findFirstLarger(num, sortedList):
    low = 0; 
    high = len(sortedList) - 1

    mid = -1
    while True:
        print("low: " + str(low) + "\t high: " + str(high))
        if (low > high):
            print("Ah geez, low is " + str(low) + " and high is " + str(high))
            return # debugging, don't want this to happen
        if low == high:
            return sortedList[low]
        else:
            mid = (low + high) / 2;
            if num == sortedList[mid]:
                return sortedList[mid]
            elif num > sortedList[mid]:
                low = mid + 1
            else:
                high = mid - 1

One case I have noted where this function does not work is as follows:

我已经注意到这个函数不起作用的一个案例如下:

>>> somenumbers=[n*2 for n in range(131072)]
>>> somenumbers[-5:]
[262134, 262136, 262138, 262140, 262142]


>>> binsearch.findFirstLarger(262139,somenumbers)
low: 0   high: 131071
low: 65536   high: 131071
low: 98304   high: 131071
low: 114688  high: 131071
low: 122880  high: 131071
low: 126976  high: 131071
low: 129024  high: 131071
low: 130048  high: 131071
low: 130560  high: 131071
low: 130816  high: 131071
low: 130944  high: 131071
low: 131008  high: 131071
low: 131040  high: 131071
low: 131056  high: 131071
low: 131064  high: 131071
low: 131068  high: 131071
low: 131070  high: 131071
low: 131070  high: 131069
Ah geez, low is 131070 and high is 131069

Here the correct result would be 262140, as this is the first number in the list greater than 262139.

这里正确的结果是262140,因为这是列表中第一个大于262139的数字。

Can anyone recommend a cleaner implementation of this that actually works? I didn't think this would be such an esoteric problem, though I haven't been able to find a solution anywhere as of yet.

任何人都可以推荐一个更实用的清洁实现吗?我不认为这会是一个如此深奥的问题,尽管我还没有找到任何解决方案。

2 个解决方案

#1


18  

Have you tried the bisect module?

你试过bisect模块吗?

def find_ge(a, key):
    '''Find smallest item greater-than or equal to key.
    Raise ValueError if no such item exists.
    If multiple keys are equal, return the leftmost.

    '''
    i = bisect_left(a, key)
    if i == len(a):
        raise ValueError('No item found with key at or above: %r' % (key,))
    return a[i]

find_ge(somenumbers, 262139)

Your code is wrong that (1) low > high is a valid termination case. (2) you should not stop at low == high, e.g. it will return an incorrect index when num == 3 for your somenumbers.

您的代码错误,(1)low> high是有效的终止案例。 (2)你不应该停在低位==高位,例如对于你的某些人,当num == 3时,它将返回一个不正确的索引。

#2


0  

If you need the implementation without bisect function, you can try the following code:

如果您需要没有bisect函数的实现,可以尝试以下代码:

def findFirstLargerOrEqual(num, sortedList):
    '''Finds the smallest index in the sortedList
    of the element which is greater-than or equal to num'''

    slen = len(sortedList)
    start = 0

    while slen > 0:
        m = start + slen//2

        if sortedList[m] < num:
            slen = slen - (m+1 - start)
            start = m+1
            continue

        if start < m and sortedList[m-1] >= num:
            slen = m - start
            continue

        return somenumbers[m]

    raise ValueError('Not found')

somenumbers=[n*2 for n in range(131072)]
print(findFirstLargerOrEqual(262139, somenumbers)) #output: 262140

#1


18  

Have you tried the bisect module?

你试过bisect模块吗?

def find_ge(a, key):
    '''Find smallest item greater-than or equal to key.
    Raise ValueError if no such item exists.
    If multiple keys are equal, return the leftmost.

    '''
    i = bisect_left(a, key)
    if i == len(a):
        raise ValueError('No item found with key at or above: %r' % (key,))
    return a[i]

find_ge(somenumbers, 262139)

Your code is wrong that (1) low > high is a valid termination case. (2) you should not stop at low == high, e.g. it will return an incorrect index when num == 3 for your somenumbers.

您的代码错误,(1)low> high是有效的终止案例。 (2)你不应该停在低位==高位,例如对于你的某些人,当num == 3时,它将返回一个不正确的索引。

#2


0  

If you need the implementation without bisect function, you can try the following code:

如果您需要没有bisect函数的实现,可以尝试以下代码:

def findFirstLargerOrEqual(num, sortedList):
    '''Finds the smallest index in the sortedList
    of the element which is greater-than or equal to num'''

    slen = len(sortedList)
    start = 0

    while slen > 0:
        m = start + slen//2

        if sortedList[m] < num:
            slen = slen - (m+1 - start)
            start = m+1
            continue

        if start < m and sortedList[m-1] >= num:
            slen = m - start
            continue

        return somenumbers[m]

    raise ValueError('Not found')

somenumbers=[n*2 for n in range(131072)]
print(findFirstLargerOrEqual(262139, somenumbers)) #output: 262140