如何在一个未排序的长度n (n)的数组中找到第k个最大的元素?

时间:2021-07-04 22:04:50

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?

我相信有一种方法可以找到在O(n)中未排序的长度为n的数组中第k个最大的元素。或者是“期望”O(n)或别的什么。我们怎么做呢?

32 个解决方案

#1


157  

This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

这叫做求k阶统计量。有一个非常简单的随机算法(称为quickselect)O(n)平均时间,O(n ^ 2)最坏情况的时间,和一个很复杂的非随机性算法(称为introselect)O(n)最坏情况的时间。*上有一些信息,但不是很好。

Everything you need is in these powerpoint slides . Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):

你需要的一切都在幻灯片里。为了提取O(n)最坏情况算法(introselect)的基本算法:

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.

这在Cormen等人的算法导论中也非常详细。

#2


110  

If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)

如果你想要一个真正的O(n)算法,而不是O(kn)或者类似的东西,那么你应该使用quickselect(它基本上是快速排序,你抛出你不感兴趣的分区)。我的教授写得很好,运行时分析:(引用)

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

快速选择算法快速找到未排序的n个元素数组的k最小元素。这是一个随机的算法,所以我们计算最坏的期望运行时间。

Here is the algorithm.

这是算法。

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

这个算法的运行时间是多少?如果对手抛硬币给我们,我们可能会发现主元总是最大的元素k总是1,给出一个运行时间。

T(n) = Theta(n) + T(n-1) = Theta(n2)

But if the choices are indeed random, the expected running time is given by

但是如果选择确实是随机的,那么预期的运行时间就会给出。

T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

我们要做的不是完全合理的假设递归总是在A1或A2的大范围内。

Let's guess that T(n) <= an for some a. Then we get

让我们猜一下T(n) <= an,然后我们得到。

T(n) 
 <= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
 = cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n ai

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

现在,我们要在右边得到一个可怕的求和符号来吸收左边的cn。如果我们把它限定为2(1/n) i=n/2到n an,我们得到大约2(1/n)(n/2)an = an。但这太大了,没有空间再挤进一个cn。我们用算术级数公式来展开求和

i=floor(n/2) to n i  
 = ∑i=1 to n i - ∑i=1 to floor(n/2) i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n2/2 - (n/4)2/2  
 = (15/32)n2

where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with

我们利用n“足够大”的优势,用更清洁(更小)的n/4替换掉丑陋的地板(n/2)因素。现在我们可以继续了。

cn + 2 (1/n) ∑i=floor(n/2) to n ai,
 <= cn + (2a/n) (15/32) n2
 = n (c + (15/16)a)
 <= an

provided a > 16c.

提供了一个> 16 c。

This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).

这就得到T(n) = O(n)显然是(n)所以得到T(n) = (n)

#3


14  

A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

一个快速的谷歌(第k个元素数组)返回了这个:http://discuss.joelonsoftware.com/default.asp?

"Make one pass through tracking the three largest values so far." 

(it was specifically for 3d largest)

(它是专为3d制作的)

and this answer:

这回答:

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)

#4


11  

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).

你喜欢快速排序。随机选择一个元素,然后把所有的东西都推到更高或更低的位置。此时,您将知道您实际选择了哪个元素,如果它是您所做的第k个元素,否则您将重复使用bin(更高或更低),第k个元素将会出现。从统计学上讲,找到第k个元素所花费的时间是n, O(n)。

#5


6  

A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.

一个程序员的算法分析的伙伴给出了一个O(n)的版本,尽管作者指出,常数因子是如此之高,你可能更倾向于天真的sort-the-list-然后选择的方法。

I answered the letter of your question :)

我回答了你的问题:)

#6


5  

The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.

c++标准库几乎完全具有这个函数调用nth_element,尽管它确实修改了您的数据。它期望线性运行时,O(N),它也做了部分排序。

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a

#7


4  

Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java

虽然不是很确定O(n)的复杂度,但它肯定在O(n)和nLog(n)之间。也肯定比nLog(n)更接近O(n)。函数是用Java编写的。

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}

#8


4  

I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.

我在n个未排序的元素中使用动态规划,特别是比赛方法,实现了kth最小化。执行时间为O(n + klog(n))。所使用的机制被列为Wikipedia页面上关于选择算法的一种方法(如上面所示)。您可以在我的博客页面上阅读有关该算法的信息,并找到代码(java)。此外,该逻辑还可以对列表进行部分排序——在O(klog(n))时间内返回first K min(或max)。

Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.

虽然代码提供了结果kth最小值,但在O(klog(n))中可以使用类似的逻辑来查找kth最大值(klog(n)),忽略了创建比赛树的前期工作。

#9


3  

You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.

你可以用O(n + kn) = O(n) (k)表示时间和O(k)的空间,通过跟踪你所看到的k个最大的元素。

For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.

对于数组中的每个元素,您可以扫描k最大的列表并将最小的元素替换为新元素,如果它更大的话。

Warren's priority heap solution is neater though.

不过,沃伦的优先级堆解决方案更简洁。

#10


3  

Read Chapter 9, Medians and Other statistics from Cormen's "Introduction to Algorithms", 2nd Ed. It has an expected linear time algorithm for selection. It's not something that people would randomly come up with in a few minutes.. A heap sort, btw, won't work in O(n), it's O(nlgn).

阅读第9章,Medians和其他来自Cormen的“算法介绍”的统计数据,第2版,它有一个期望的线性时间算法来选择。这不是人们在几分钟内就会随机想到的事情。堆排序,顺便说一下,不会在O(n)中工作,它是O(nlgn)。

#11


2  

Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies, Now recurrence becomes: T(n) = T(n/2) + cn which gives me O (n) overal.

在线性时间找到数组的值,然后使用分区程序一样快速排序把数组分成两部分,左边的值中值较小(<)比中值和右大于(>)中,也可以通过lineat时间,现在,去那个数组k元素所在的一部分,现在复发就变成:T(n)= T(n / 2)+ cn给我O(n)扶持政策。

#12


2  

Sexy quickselect in Python

在Python中性感quickselect

def quickselect(arr, k):
    '''
     k = 1 returns first element in ascending order.
     can be easily modified to return first element in descending order
    '''

    r = random.randrange(0, len(arr))

    a1 = [i for i in arr if i < arr[r]] '''partition'''
    a2 = [i for i in arr if i > arr[r]]

    if k <= len(a1):
        return quickselect(a1, k)
    elif k > len(arr)-len(a2):
        return quickselect(a2, k - (len(arr) - len(a2)))
    else:
        return arr[r]

#13


2  

Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.

下面是完整实现的链接,这是一个非常广泛的解释,说明如何在未排序的算法中找到第k个元素。基本思想是像快速排序那样对数组进行分区。但为了避免极端情况下(例如,当最小的元素在每一步选为支点,所以算法退化到O(n ^ 2)运行时间),特殊主选择应用,称为median-of-medians算法。在最坏的情况下,整个解决方案在O(n)时间内运行。

Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):

这是链接到全文(它是关于找到第k个最小的元素,但是原理是一样的,找到第k个最大):

Finding Kth Smallest Element in an Unsorted Array

在未排序的数组中找到第k个最小的元素。

#14


2  

As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.

根据这篇论文,在一个n项列表中找到第k个最大的项,下面的算法在最坏的情况下将花费O(n)时间。

  1. Divide the array in to n/5 lists of 5 elements each.
  2. 将数组分成n/5个元素列表,每个元素有5个元素。
  3. Find the median in each sub array of 5 elements.
  4. 查找每个子数组中5个元素的中值。
  5. Recursively find the median of all the medians, lets call it M
  6. 递归地找到所有中位数的中位数,我们叫它M。
  7. Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
  8. 将数组划分为两个子数组第一个子数组包含大于M的元素,假设这个子数组是a1,而其他子数组包含小于M的元素。我们称这个子数组为a2。
  9. If k <= |a1|, return selection (a1,k).
  10. 如果k <= |a1|,返回选择(a1,k)。
  11. If k− 1 = |a1|, return M.
  12. 如果k 1 = |a1|,返回M。
  13. If k> |a1| + 1, return selection(a2,k −a1 − 1).
  14. 如果k > | a1 | + 1,返回选择a1(a2,k−−1)。

Analysis: As suggested in the original paper:

分析:如原论文所述:

We use the median to partition the list into two halves(the first half, if k <= n/2 , and the second half otherwise). This algorithm takes time cn at the first level of recursion for some constant c, cn/2 at the next level (since we recurse in a list of size n/2), cn/4 at the third level, and so on. The total time taken is cn + cn/2 + cn/4 + .... = 2cn = o(n).

我们使用中位数将列表划分为两个半部分(如果k <= n/2,而另一半则相反)。这个算法需要时间cn在第一级的递归中,在下一层(因为我们递归在一个大小为n/2的列表中),第三层的cn/4,等等。总时间是cn + cn/2 + cn/4 + ....= 2 cn = o(n)。

Why partition size is taken 5 and not 3?

为什么划分大小是5而不是3?

As mentioned in original paper:

如原文所述:

Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast half of the medians greater than the median-of-medians, hence atleast half of the n/5 blocks have atleast 3 elements and this gives a 3n/10 split, which means the other partition is 7n/10 in worst case. That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the worst-case running time isO(n).

列表除以5 70−30的保证最糟糕的分手。至少一半的medians大于median-of-medians,因此至少有一半的n/5块有至少3个元素,这就给出了一个3n/10的分割,这意味着在最坏的情况下,另一个分区是7n/10。T(n) = T(n/5)+T(7n/10)+O(n)由于n/5+7n/10 < 1,最坏情况的运行时间是isO(n)。

Now I have tried to implement the above algorithm as:

现在我尝试将上述算法实现为:

public static int findKthLargestUsingMedian(Integer[] array, int k) {
        // Step 1: Divide the list into n/5 lists of 5 element each.
        int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
        // Step 2: Find pivotal element aka median of medians.
        int medianOfMedian =  findMedianOfMedians(array, noOfRequiredLists);
        //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
        List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
        List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
        for (Integer element : array) {
            if (element < medianOfMedian) {
                listWithSmallerNumbers.add(element);
            } else if (element > medianOfMedian) {
                listWithGreaterNumbers.add(element);
            }
        }
        // Next step.
        if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
        else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
        else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
        return -1;
    }

    public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
        int[] medians = new int[noOfRequiredLists];
        for (int count = 0; count < noOfRequiredLists; count++) {
            int startOfPartialArray = 5 * count;
            int endOfPartialArray = startOfPartialArray + 5;
            Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
            // Step 2: Find median of each of these sublists.
            int medianIndex = partialArray.length/2;
            medians[count] = partialArray[medianIndex];
        }
        // Step 3: Find median of the medians.
        return medians[medians.length / 2];
    }

Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).

为了完成任务,另一个算法使用优先队列,并花费时间O(nlogn)。

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums) {
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements - k + 1 > 0) {
            p = pq.poll();
            k++;
        }

        return p;
    }

Both of these algorithms can be tested as:

这两种算法都可以测试为:

public static void main(String[] args) throws IOException {
        Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
        System.out.println(findKthLargestUsingMedian(numbers, 8));
        System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
    }

As expected output is: 18 18

预期产量为:1818。

#15


1  

iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)

遍历列表中。如果当前值大于存储的最大值,则将其存储为最大值,并将1-4和5从列表中删除。如果没有,把它和数字2比较,做同样的事情。重复,检查所有5个存储值。这应该在O(n)中

#16


1  

i would like to suggest one answer

我想建议一个答案。

if we take the first k elements and sort them into a linked list of k values

如果我们取第k个元素并把它们排序成一个k值的链表。

now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)

现在其他价值即使对最坏的情况下如果我们做插入排序休息n - k值即使在最坏的情况下的比较会为上一页k *(n - k)和k值进行排序让它被k *(k - 1)所以出来(nk-k)是o(n)

cheers

干杯

#17


1  

Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here: http://cs.indstate.edu/~spitla/presentation.pdf

对中位数- medians算法的解释,可以在这里找到n最大的整数:http://cs.indstate.edu/~spitla/ present.pdf。

Implementation in c++ is below:

c++的实现如下:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int findMedian(vector<int> vec){
//    Find median of a vector
    int median;
    size_t size = vec.size();
    median = vec[(size/2)];
    return median;
}

int findMedianOfMedians(vector<vector<int> > values){
    vector<int> medians;

    for (int i = 0; i < values.size(); i++) {
        int m = findMedian(values[i]);
        medians.push_back(m);
    }

    return findMedian(medians);
}

void selectionByMedianOfMedians(const vector<int> values, int k){
//    Divide the list into n/5 lists of 5 elements each
    vector<vector<int> > vec2D;

    int count = 0;
    while (count != values.size()) {
        int countRow = 0;
        vector<int> row;

        while ((countRow < 5) && (count < values.size())) {
            row.push_back(values[count]);
            count++;
            countRow++;
        }
        vec2D.push_back(row);
    }

    cout<<endl<<endl<<"Printing 2D vector : "<<endl;
    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            cout<<vec2D[i][j]<<" ";
        }
        cout<<endl;
    }
    cout<<endl;

//    Calculating a new pivot for making splits
    int m = findMedianOfMedians(vec2D);
    cout<<"Median of medians is : "<<m<<endl;

//    Partition the list into unique elements larger than 'm' (call this sublist L1) and
//    those smaller them 'm' (call this sublist L2)
    vector<int> L1, L2;

    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            if (vec2D[i][j] > m) {
                L1.push_back(vec2D[i][j]);
            }else if (vec2D[i][j] < m){
                L2.push_back(vec2D[i][j]);
            }
        }
    }

//    Checking the splits as per the new pivot 'm'
    cout<<endl<<"Printing L1 : "<<endl;
    for (int i = 0; i < L1.size(); i++) {
        cout<<L1[i]<<" ";
    }

    cout<<endl<<endl<<"Printing L2 : "<<endl;
    for (int i = 0; i < L2.size(); i++) {
        cout<<L2[i]<<" ";
    }

//    Recursive calls
    if ((k - 1) == L1.size()) {
        cout<<endl<<endl<<"Answer :"<<m;
    }else if (k <= L1.size()) {
        return selectionByMedianOfMedians(L1, k);
    }else if (k > (L1.size() + 1)){
        return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
    }

}

int main()
{
    int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};

    vector<int> vec(values, values + 25);

    cout<<"The given array is : "<<endl;
    for (int i = 0; i < vec.size(); i++) {
        cout<<vec[i]<<" ";
    }

    selectionByMedianOfMedians(vec, 8);

    return 0;
}

#18


1  

There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.

还有Wirth的选择算法,它的实现比QuickSelect更简单。Wirth的选择算法比QuickSelect慢,但是随着一些改进,它会变得更快。

In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):

更多细节。使用Vladimir Zabrodsky的MODIFIND优化和median-of-3 pivot选择,并对算法的分区部分的最后步骤稍加注意,我提出了以下算法(可想象的命名为“LefSelect”):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }

# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
    int l=0, m = n-1, i=l, j=m;
    float x;

    while (l<m) {
        if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
        if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
        if( a[j] < a[k] ) F_SWAP(a[k],a[j]);

        x=a[k];
        while (j>k & i<k) {
            do i++; while (a[i]<x);
            do j--; while (a[j]>x);

            F_SWAP(a[i],a[j]);
        }
        i++; j--;

        if (j<k) {
            while (a[i]<x) i++;
            l=i; j=m;
        }
        if (k<i) {
            while (x<a[j]) j--;
            m=j; i=l;
        }
    }
    return a[k];
}

In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.

在我在这里做的基准测试中,LefSelect比QuickSelect快20-30%。

#19


1  

Haskell Solution:

Haskell的解决方案:

kthElem index list = sort list !! index

withShape ~[]     []     = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys

sort []     = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
  where
   ls = filter (<  x)
   rs = filter (>= x)

This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.

这实现了中值解决方案的中位数,通过使用withShape方法来发现分区的大小,而不用实际计算它。

#20


1  

Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.

这里是一个随机快速选择的c++实现。这个想法是随机选取一个主元。为了实现随机分区,我们使用一个随机函数rand()在l和r之间生成索引,用最后一个元素随机生成的索引交换元素,最后调用使用last元素作为pivot的标准分区进程。

#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;

int randomPartition(int arr[], int l, int r);

// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method.  ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
    // If k is smaller than number of elements in array
    if (k > 0 && k <= r - l + 1)
    {
        // Partition the array around a random element and
        // get position of pivot element in sorted array
        int pos = randomPartition(arr, l, r);

        // If position is same as k
        if (pos-l == k-1)
            return arr[pos];
        if (pos-l > k-1)  // If position is more, recur for left subarray
            return kthSmallest(arr, l, pos-1, k);

        // Else recur for right subarray
        return kthSmallest(arr, pos+1, r, k-pos+l-1);
    }

    // If k is more than number of elements in array
    return INT_MAX;
}

void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Standard partition process of QuickSort().  It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
    int x = arr[r], i = l;
    for (int j = l; j <= r - 1; j++)
    {
        if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
        {
            swap(&arr[i], &arr[j]);
            i++;
        }
    }
    swap(&arr[i], &arr[r]); // swap the pivot
    return i;
}

// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
    int n = r-l+1;
    int pivot = rand() % n;
    swap(&arr[l + pivot], &arr[r]);
    return partition(arr, l, r);
}

// Driver program to test above methods
int main()
{
    int arr[] = {12, 3, 5, 7, 4, 19, 26};
    int n = sizeof(arr)/sizeof(arr[0]), k = 3;
    cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
    return 0;
}

The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)

上述解决方案最糟糕的时间复杂度仍然是O(n2)。在最坏的情况下,随机函数可能总是选择一个角元素。的预期时间复杂度以上随机QuickSelectΘ(n)

#21


1  

How about this kinda approach

这种方法怎么样?

Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)

保持一个长度为k的缓冲区和一个tmp_max,得到tmp_max是O(k)并且执行了n次,所以类似于O(kn)

如何在一个未排序的长度n (n)的数组中找到第k个最大的元素?

Is it right or am i missing something ?

是对的还是我错过了什么?

Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.

虽然它并没有超出快速选择和最坏的中值统计方法的案例,但是它很容易理解和实现。

#22


0  

This is an implementation in Javascript.

这是Javascript的一个实现。

If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).

如果您释放了不能修改数组的约束,您可以使用两个索引来防止使用额外的内存,以确定“当前分区”(在经典的快速排序样式中)。

function kthMax(a, k){
    var size = a.length;

    var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) 

    //Create an array with all element lower than the pivot and an array with all element higher than the pivot
    var i, lowerArray = [], upperArray = [];
    for (i = 0; i  < size; i++){
        var current = a[i];

        if (current < pivot) {
            lowerArray.push(current);
        } else if (current > pivot) {
            upperArray.push(current);
        }
    }

    //Which one should I continue with?
    if(k <= upperArray.length) {
        //Upper
        return kthMax(upperArray, k);
    } else {
        var newK = k - (size - lowerArray.length);

        if (newK > 0) {
            ///Lower
            return kthMax(lowerArray, newK);
        } else {
            //None ... it's the current pivot!
            return pivot;
        }   
    }
}  

If you want to test how it perform, you can use this variation:

如果你想测试它的表现,你可以使用这个变体:

    function kthMax (a, k, logging) {
         var comparisonCount = 0; //Number of comparison that the algorithm uses
         var memoryCount = 0;     //Number of integers in memory that the algorithm uses
         var _log = logging;

         if(k < 0 || k >= a.length) {
            if (_log) console.log ("k is out of range"); 
            return false;
         }      

         function _kthmax(a, k){
             var size = a.length;
             var pivot = a[parseInt(Math.random()*size)];
             if(_log) console.log("Inputs:", a,  "size="+size, "k="+k, "pivot="+pivot);

             // This should never happen. Just a nice check in this exercise
             // if you are playing with the code to avoid never ending recursion            
             if(typeof pivot === "undefined") {
                 if (_log) console.log ("Ops..."); 
                 return false;
             }

             var i, lowerArray = [], upperArray = [];
             for (i = 0; i  < size; i++){
                 var current = a[i];
                 if (current < pivot) {
                     comparisonCount += 1;
                     memoryCount++;
                     lowerArray.push(current);
                 } else if (current > pivot) {
                     comparisonCount += 2;
                     memoryCount++;
                     upperArray.push(current);
                 }
             }
             if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);

             if(k <= upperArray.length) {
                 comparisonCount += 1;
                 return _kthmax(upperArray, k);
             } else if (k > size - lowerArray.length) {
                 comparisonCount += 2;
                 return _kthmax(lowerArray, k - (size - lowerArray.length));
             } else {
                 comparisonCount += 2;
                 return pivot;
             }
     /* 
      * BTW, this is the logic for kthMin if we want to implement that... ;-)
      * 

             if(k <= lowerArray.length) {
                 return kthMin(lowerArray, k);
             } else if (k > size - upperArray.length) {
                 return kthMin(upperArray, k - (size - upperArray.length));
             } else 
                 return pivot;
     */            
         }

         var result = _kthmax(a, k);
         return {result: result, iterations: comparisonCount, memory: memoryCount};
     }

The rest of the code is just to create some playground:

剩下的代码只是创建一些游戏:

    function getRandomArray (n){
        var ar = [];
        for (var i = 0, l = n; i < l; i++) {
            ar.push(Math.round(Math.random() * l))
        }

        return ar;
    }

    //Create a random array of 50 numbers
    var ar = getRandomArray (50);   

Now, run you tests a few time. Because of the Math.random() it will produce every time different results:

现在,运行测试几次。由于Math.random(),它将每次产生不同的结果:

    kthMax(ar, 2, true);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 34, true);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);

If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.

如果你对它进行几次测试,你甚至可以从经验上看出,迭代次数平均为O(n) ~=常数* n,而k的值不影响算法。

#23


0  

I came up with this algorithm and seems to be O(n):

我提出了这个算法,似乎是O(n)

Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.

假设k=3,我们想要找到数组中第三大的项。我将创建三个变量,并将数组的每个项与这三个变量的最小值进行比较。如果数组项大于我们的最小值,我们将用项目值替换最小变量。我们继续同样的事情直到数组结束。我们三个变量的最小值是数组中第三大的项。

define variables a=0, b=0, c=0
iterate through the array items
    find minimum a,b,c
    if item > min then replace the min variable with item value
    continue until end of array
the minimum of a,b,c is our answer

And, to find Kth largest item we need K variables.

要找到第K个最大的项,我们需要K个变量。

Example: (k=3)

例如:(k = 3)

[1,2,4,1,7,3,9,5,6,2,9,8]

Final variable values:

a=7 (answer)
b=8
c=9

Can someone please review this and let me know what I am missing?

有人能回顾一下,让我知道我遗漏了什么吗?

#24


0  

Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):

这里是算法的实现建议(我也在这里用随机支点来实现):

public class Median {

    public static void main(String[] s) {

        int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
        System.out.println(selectK(test,8));

        /*
        int n = 100000000;
        int[] test = new int[n];
        for(int i=0; i<test.length; i++)
            test[i] = (int)(Math.random()*test.length);

        long start = System.currentTimeMillis();
        random_selectK(test, test.length/2);
        long end = System.currentTimeMillis();
        System.out.println(end - start);
        */
    }

    public static int random_selectK(int[] a, int k) {
        if(a.length <= 1)
            return a[0];

        int r = (int)(Math.random() * a.length);
        int p = a[r];

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return random_selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return random_selectK(temp,k-small-equal);
        }
    }

    public static int selectK(int[] a, int k) {
        if(a.length <= 5) {
            Arrays.sort(a);
            return a[k-1];
        }

        int p = median_of_medians(a);

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return selectK(temp,k-small-equal);
        }
    }

    private static int median_of_medians(int[] a) {
        int[] b = new int[a.length/5];
        int[] temp = new int[5];
        for(int i=0; i<b.length; i++) {
            for(int j=0; j<5; j++)
                temp[j] = a[5*i + j];
            Arrays.sort(temp);
            b[i] = temp[2];
        }

        return selectK(b, b.length/2 + 1);
    }
}

#25


0  

it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right

它类似于快速排序策略,在这里我们选择一个任意的主元,并将较小的元素移到它的左边,并将较大的元素移到右边。

    public static int kthElInUnsortedList(List<int> list, int k)
    {
        if (list.Count == 1)
            return list[0];

        List<int> left = new List<int>();
        List<int> right = new List<int>();

        int pivotIndex = list.Count / 2;
        int pivot = list[pivotIndex]; //arbitrary

        for (int i = 0; i < list.Count && i != pivotIndex; i++)
        {
            int currentEl = list[i];
            if (currentEl < pivot)
                left.Add(currentEl);
            else
                right.Add(currentEl);
        }

        if (k == left.Count + 1)
            return pivot;

        if (left.Count < k)
            return kthElInUnsortedList(right, k - left.Count - 1);
        else
            return kthElInUnsortedList(left, k);
    }

#26


0  

Go to the End of this link : ...........

走到这条链接的末尾:

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

#27


0  

  1. Have Priority queue created.
  2. 优先队列创建。
  3. Insert all the elements into heap.
  4. 将所有元素插入堆中。
  5. Call poll() k times.

    电话调查()k乘以。

    public static int getKthLargestElements(int[] arr)
    {
        PriorityQueue<Integer> pq =  new PriorityQueue<>((x , y) -> (y-x));
        //insert all the elements into heap
        for(int ele : arr)
           pq.offer(ele);
        // call poll() k times
        int i=0;
        while(i&lt;k)
         {
           int result = pq.poll();
         } 
       return result;        
    }
    

#28


0  

You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.

你可以在O(n)时间和常数空间中找到第k个最小的元素。如果我们考虑数组只针对整数。

The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range. We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest. Do the binary search until you reach the kth-smallest number

方法是对数组值的范围进行二分查找。如果我们有一个min_value和一个max_value都在整数范围内,我们可以在这个范围内进行二分查找。我们可以写一个比较器函数,它会告诉我们,任何值都是k -最小的或者小于k -最小的或者大于k -最小的。进行二分查找,直到到达第k个最小的数?

Here is the code for that

这是代码。

class Solution:

类解决方案:

def _iskthsmallest(self, A, val, k):
    less_count, equal_count = 0, 0
    for i in range(len(A)):
        if A[i] == val: equal_count += 1
        if A[i] < val: less_count += 1

    if less_count >= k: return 1
    if less_count + equal_count < k: return -1
    return 0

def kthsmallest_binary(self, A, min_val, max_val, k):
    if min_val == max_val:
        return min_val
    mid = (min_val + max_val)/2
    iskthsmallest = self._iskthsmallest(A, mid, k)
    if iskthsmallest == 0: return mid
    if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
    return self.kthsmallest_binary(A, mid+1, max_val, k)

# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
    if not A: return 0
    if k > len(A): return 0
    min_val, max_val = min(A), max(A)
    return self.kthsmallest_binary(A, min_val, max_val, k)

#29


0  

There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.

还有一种算法,优于快速选择算法。它被称为floyd -铆钉(FR)算法。

Original article: https://doi.org/10.1145/360680.360694

原文:https://doi.org/10.1145/360680.360694

Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf

版本下载:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf。

Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

*文章https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.

我尝试在c++中实现quickselect和FR算法。我还将它们与标准c++库实现std::nth_element(基本上是快速选择和heapselect的混合)进行比较。结果是快速选择,nth_element平均运行,但FR算法运行约。比他们快两倍。

Sample code that I used for FR algorithm:

我用于FR算法的样本代码:

template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
    if (n == 0)
        return *(std::min_element(data.begin(), data.end()));
    else if (n == data.size() - 1)
        return *(std::max_element(data.begin(), data.end()));
    else
        return _FRselect(data, 0, data.size() - 1, n);
}

template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
    size_t leftIdx = left;
    size_t rightIdx = right;

    while (rightIdx > leftIdx)
    {
        if (rightIdx - leftIdx > 600)
        {
            size_t range = rightIdx - leftIdx + 1;
            long long i = n - (long long)leftIdx + 1;
            long long z = log(range);
            long long s = 0.5 * exp(2 * z / 3);
            long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);

            size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
            size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);

            _FRselect(data, newLeft, newRight, n);
        }
        T t = data[n];
        size_t i = leftIdx;
        size_t j = rightIdx;
        // arrange pivot and right index
        std::swap(data[leftIdx], data[n]);
        if (data[rightIdx] > t)
            std::swap(data[rightIdx], data[leftIdx]);

        while (i < j)
        {
            std::swap(data[i], data[j]);
            ++i; --j;
            while (data[i] < t) ++i;
            while (data[j] > t) --j;
        }

        if (data[leftIdx] == t)
            std::swap(data[leftIdx], data[j]);
        else
        {
            ++j;
            std::swap(data[j], data[rightIdx]);
        }
        // adjust left and right towards the boundaries of the subset
        // containing the (k - left + 1)th smallest element
        if (j <= n)
            leftIdx = j + 1;
        if (n <= j)
            rightIdx = j - 1;
    }

    return data[leftIdx];
}

template <typename T>
int sgn(T val) {
    return (T(0) < val) - (val < T(0));
}

#30


-1  

What I would do is this:

我要做的是

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

您可以简单地将指针存储到链表中的第一个和最后一个元素。只有当更新到列表时,它们才会改变。

Update:

更新:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

#1


157  

This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

这叫做求k阶统计量。有一个非常简单的随机算法(称为quickselect)O(n)平均时间,O(n ^ 2)最坏情况的时间,和一个很复杂的非随机性算法(称为introselect)O(n)最坏情况的时间。*上有一些信息,但不是很好。

Everything you need is in these powerpoint slides . Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):

你需要的一切都在幻灯片里。为了提取O(n)最坏情况算法(introselect)的基本算法:

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.

这在Cormen等人的算法导论中也非常详细。

#2


110  

If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)

如果你想要一个真正的O(n)算法,而不是O(kn)或者类似的东西,那么你应该使用quickselect(它基本上是快速排序,你抛出你不感兴趣的分区)。我的教授写得很好,运行时分析:(引用)

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

快速选择算法快速找到未排序的n个元素数组的k最小元素。这是一个随机的算法,所以我们计算最坏的期望运行时间。

Here is the algorithm.

这是算法。

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

这个算法的运行时间是多少?如果对手抛硬币给我们,我们可能会发现主元总是最大的元素k总是1,给出一个运行时间。

T(n) = Theta(n) + T(n-1) = Theta(n2)

But if the choices are indeed random, the expected running time is given by

但是如果选择确实是随机的,那么预期的运行时间就会给出。

T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

我们要做的不是完全合理的假设递归总是在A1或A2的大范围内。

Let's guess that T(n) <= an for some a. Then we get

让我们猜一下T(n) <= an,然后我们得到。

T(n) 
 <= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
 = cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n ai

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

现在,我们要在右边得到一个可怕的求和符号来吸收左边的cn。如果我们把它限定为2(1/n) i=n/2到n an,我们得到大约2(1/n)(n/2)an = an。但这太大了,没有空间再挤进一个cn。我们用算术级数公式来展开求和

i=floor(n/2) to n i  
 = ∑i=1 to n i - ∑i=1 to floor(n/2) i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n2/2 - (n/4)2/2  
 = (15/32)n2

where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with

我们利用n“足够大”的优势,用更清洁(更小)的n/4替换掉丑陋的地板(n/2)因素。现在我们可以继续了。

cn + 2 (1/n) ∑i=floor(n/2) to n ai,
 <= cn + (2a/n) (15/32) n2
 = n (c + (15/16)a)
 <= an

provided a > 16c.

提供了一个> 16 c。

This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).

这就得到T(n) = O(n)显然是(n)所以得到T(n) = (n)

#3


14  

A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

一个快速的谷歌(第k个元素数组)返回了这个:http://discuss.joelonsoftware.com/default.asp?

"Make one pass through tracking the three largest values so far." 

(it was specifically for 3d largest)

(它是专为3d制作的)

and this answer:

这回答:

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)

#4


11  

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).

你喜欢快速排序。随机选择一个元素,然后把所有的东西都推到更高或更低的位置。此时,您将知道您实际选择了哪个元素,如果它是您所做的第k个元素,否则您将重复使用bin(更高或更低),第k个元素将会出现。从统计学上讲,找到第k个元素所花费的时间是n, O(n)。

#5


6  

A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.

一个程序员的算法分析的伙伴给出了一个O(n)的版本,尽管作者指出,常数因子是如此之高,你可能更倾向于天真的sort-the-list-然后选择的方法。

I answered the letter of your question :)

我回答了你的问题:)

#6


5  

The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.

c++标准库几乎完全具有这个函数调用nth_element,尽管它确实修改了您的数据。它期望线性运行时,O(N),它也做了部分排序。

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a

#7


4  

Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java

虽然不是很确定O(n)的复杂度,但它肯定在O(n)和nLog(n)之间。也肯定比nLog(n)更接近O(n)。函数是用Java编写的。

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}

#8


4  

I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.

我在n个未排序的元素中使用动态规划,特别是比赛方法,实现了kth最小化。执行时间为O(n + klog(n))。所使用的机制被列为Wikipedia页面上关于选择算法的一种方法(如上面所示)。您可以在我的博客页面上阅读有关该算法的信息,并找到代码(java)。此外,该逻辑还可以对列表进行部分排序——在O(klog(n))时间内返回first K min(或max)。

Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.

虽然代码提供了结果kth最小值,但在O(klog(n))中可以使用类似的逻辑来查找kth最大值(klog(n)),忽略了创建比赛树的前期工作。

#9


3  

You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.

你可以用O(n + kn) = O(n) (k)表示时间和O(k)的空间,通过跟踪你所看到的k个最大的元素。

For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.

对于数组中的每个元素,您可以扫描k最大的列表并将最小的元素替换为新元素,如果它更大的话。

Warren's priority heap solution is neater though.

不过,沃伦的优先级堆解决方案更简洁。

#10


3  

Read Chapter 9, Medians and Other statistics from Cormen's "Introduction to Algorithms", 2nd Ed. It has an expected linear time algorithm for selection. It's not something that people would randomly come up with in a few minutes.. A heap sort, btw, won't work in O(n), it's O(nlgn).

阅读第9章,Medians和其他来自Cormen的“算法介绍”的统计数据,第2版,它有一个期望的线性时间算法来选择。这不是人们在几分钟内就会随机想到的事情。堆排序,顺便说一下,不会在O(n)中工作,它是O(nlgn)。

#11


2  

Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies, Now recurrence becomes: T(n) = T(n/2) + cn which gives me O (n) overal.

在线性时间找到数组的值,然后使用分区程序一样快速排序把数组分成两部分,左边的值中值较小(<)比中值和右大于(>)中,也可以通过lineat时间,现在,去那个数组k元素所在的一部分,现在复发就变成:T(n)= T(n / 2)+ cn给我O(n)扶持政策。

#12


2  

Sexy quickselect in Python

在Python中性感quickselect

def quickselect(arr, k):
    '''
     k = 1 returns first element in ascending order.
     can be easily modified to return first element in descending order
    '''

    r = random.randrange(0, len(arr))

    a1 = [i for i in arr if i < arr[r]] '''partition'''
    a2 = [i for i in arr if i > arr[r]]

    if k <= len(a1):
        return quickselect(a1, k)
    elif k > len(arr)-len(a2):
        return quickselect(a2, k - (len(arr) - len(a2)))
    else:
        return arr[r]

#13


2  

Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.

下面是完整实现的链接,这是一个非常广泛的解释,说明如何在未排序的算法中找到第k个元素。基本思想是像快速排序那样对数组进行分区。但为了避免极端情况下(例如,当最小的元素在每一步选为支点,所以算法退化到O(n ^ 2)运行时间),特殊主选择应用,称为median-of-medians算法。在最坏的情况下,整个解决方案在O(n)时间内运行。

Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):

这是链接到全文(它是关于找到第k个最小的元素,但是原理是一样的,找到第k个最大):

Finding Kth Smallest Element in an Unsorted Array

在未排序的数组中找到第k个最小的元素。

#14


2  

As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.

根据这篇论文,在一个n项列表中找到第k个最大的项,下面的算法在最坏的情况下将花费O(n)时间。

  1. Divide the array in to n/5 lists of 5 elements each.
  2. 将数组分成n/5个元素列表,每个元素有5个元素。
  3. Find the median in each sub array of 5 elements.
  4. 查找每个子数组中5个元素的中值。
  5. Recursively find the median of all the medians, lets call it M
  6. 递归地找到所有中位数的中位数,我们叫它M。
  7. Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
  8. 将数组划分为两个子数组第一个子数组包含大于M的元素,假设这个子数组是a1,而其他子数组包含小于M的元素。我们称这个子数组为a2。
  9. If k <= |a1|, return selection (a1,k).
  10. 如果k <= |a1|,返回选择(a1,k)。
  11. If k− 1 = |a1|, return M.
  12. 如果k 1 = |a1|,返回M。
  13. If k> |a1| + 1, return selection(a2,k −a1 − 1).
  14. 如果k > | a1 | + 1,返回选择a1(a2,k−−1)。

Analysis: As suggested in the original paper:

分析:如原论文所述:

We use the median to partition the list into two halves(the first half, if k <= n/2 , and the second half otherwise). This algorithm takes time cn at the first level of recursion for some constant c, cn/2 at the next level (since we recurse in a list of size n/2), cn/4 at the third level, and so on. The total time taken is cn + cn/2 + cn/4 + .... = 2cn = o(n).

我们使用中位数将列表划分为两个半部分(如果k <= n/2,而另一半则相反)。这个算法需要时间cn在第一级的递归中,在下一层(因为我们递归在一个大小为n/2的列表中),第三层的cn/4,等等。总时间是cn + cn/2 + cn/4 + ....= 2 cn = o(n)。

Why partition size is taken 5 and not 3?

为什么划分大小是5而不是3?

As mentioned in original paper:

如原文所述:

Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast half of the medians greater than the median-of-medians, hence atleast half of the n/5 blocks have atleast 3 elements and this gives a 3n/10 split, which means the other partition is 7n/10 in worst case. That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the worst-case running time isO(n).

列表除以5 70−30的保证最糟糕的分手。至少一半的medians大于median-of-medians,因此至少有一半的n/5块有至少3个元素,这就给出了一个3n/10的分割,这意味着在最坏的情况下,另一个分区是7n/10。T(n) = T(n/5)+T(7n/10)+O(n)由于n/5+7n/10 < 1,最坏情况的运行时间是isO(n)。

Now I have tried to implement the above algorithm as:

现在我尝试将上述算法实现为:

public static int findKthLargestUsingMedian(Integer[] array, int k) {
        // Step 1: Divide the list into n/5 lists of 5 element each.
        int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
        // Step 2: Find pivotal element aka median of medians.
        int medianOfMedian =  findMedianOfMedians(array, noOfRequiredLists);
        //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
        List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
        List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
        for (Integer element : array) {
            if (element < medianOfMedian) {
                listWithSmallerNumbers.add(element);
            } else if (element > medianOfMedian) {
                listWithGreaterNumbers.add(element);
            }
        }
        // Next step.
        if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
        else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
        else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
        return -1;
    }

    public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
        int[] medians = new int[noOfRequiredLists];
        for (int count = 0; count < noOfRequiredLists; count++) {
            int startOfPartialArray = 5 * count;
            int endOfPartialArray = startOfPartialArray + 5;
            Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
            // Step 2: Find median of each of these sublists.
            int medianIndex = partialArray.length/2;
            medians[count] = partialArray[medianIndex];
        }
        // Step 3: Find median of the medians.
        return medians[medians.length / 2];
    }

Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).

为了完成任务,另一个算法使用优先队列,并花费时间O(nlogn)。

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums) {
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements - k + 1 > 0) {
            p = pq.poll();
            k++;
        }

        return p;
    }

Both of these algorithms can be tested as:

这两种算法都可以测试为:

public static void main(String[] args) throws IOException {
        Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
        System.out.println(findKthLargestUsingMedian(numbers, 8));
        System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
    }

As expected output is: 18 18

预期产量为:1818。

#15


1  

iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)

遍历列表中。如果当前值大于存储的最大值,则将其存储为最大值,并将1-4和5从列表中删除。如果没有,把它和数字2比较,做同样的事情。重复,检查所有5个存储值。这应该在O(n)中

#16


1  

i would like to suggest one answer

我想建议一个答案。

if we take the first k elements and sort them into a linked list of k values

如果我们取第k个元素并把它们排序成一个k值的链表。

now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)

现在其他价值即使对最坏的情况下如果我们做插入排序休息n - k值即使在最坏的情况下的比较会为上一页k *(n - k)和k值进行排序让它被k *(k - 1)所以出来(nk-k)是o(n)

cheers

干杯

#17


1  

Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here: http://cs.indstate.edu/~spitla/presentation.pdf

对中位数- medians算法的解释,可以在这里找到n最大的整数:http://cs.indstate.edu/~spitla/ present.pdf。

Implementation in c++ is below:

c++的实现如下:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int findMedian(vector<int> vec){
//    Find median of a vector
    int median;
    size_t size = vec.size();
    median = vec[(size/2)];
    return median;
}

int findMedianOfMedians(vector<vector<int> > values){
    vector<int> medians;

    for (int i = 0; i < values.size(); i++) {
        int m = findMedian(values[i]);
        medians.push_back(m);
    }

    return findMedian(medians);
}

void selectionByMedianOfMedians(const vector<int> values, int k){
//    Divide the list into n/5 lists of 5 elements each
    vector<vector<int> > vec2D;

    int count = 0;
    while (count != values.size()) {
        int countRow = 0;
        vector<int> row;

        while ((countRow < 5) && (count < values.size())) {
            row.push_back(values[count]);
            count++;
            countRow++;
        }
        vec2D.push_back(row);
    }

    cout<<endl<<endl<<"Printing 2D vector : "<<endl;
    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            cout<<vec2D[i][j]<<" ";
        }
        cout<<endl;
    }
    cout<<endl;

//    Calculating a new pivot for making splits
    int m = findMedianOfMedians(vec2D);
    cout<<"Median of medians is : "<<m<<endl;

//    Partition the list into unique elements larger than 'm' (call this sublist L1) and
//    those smaller them 'm' (call this sublist L2)
    vector<int> L1, L2;

    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            if (vec2D[i][j] > m) {
                L1.push_back(vec2D[i][j]);
            }else if (vec2D[i][j] < m){
                L2.push_back(vec2D[i][j]);
            }
        }
    }

//    Checking the splits as per the new pivot 'm'
    cout<<endl<<"Printing L1 : "<<endl;
    for (int i = 0; i < L1.size(); i++) {
        cout<<L1[i]<<" ";
    }

    cout<<endl<<endl<<"Printing L2 : "<<endl;
    for (int i = 0; i < L2.size(); i++) {
        cout<<L2[i]<<" ";
    }

//    Recursive calls
    if ((k - 1) == L1.size()) {
        cout<<endl<<endl<<"Answer :"<<m;
    }else if (k <= L1.size()) {
        return selectionByMedianOfMedians(L1, k);
    }else if (k > (L1.size() + 1)){
        return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
    }

}

int main()
{
    int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};

    vector<int> vec(values, values + 25);

    cout<<"The given array is : "<<endl;
    for (int i = 0; i < vec.size(); i++) {
        cout<<vec[i]<<" ";
    }

    selectionByMedianOfMedians(vec, 8);

    return 0;
}

#18


1  

There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.

还有Wirth的选择算法,它的实现比QuickSelect更简单。Wirth的选择算法比QuickSelect慢,但是随着一些改进,它会变得更快。

In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):

更多细节。使用Vladimir Zabrodsky的MODIFIND优化和median-of-3 pivot选择,并对算法的分区部分的最后步骤稍加注意,我提出了以下算法(可想象的命名为“LefSelect”):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }

# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
    int l=0, m = n-1, i=l, j=m;
    float x;

    while (l<m) {
        if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
        if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
        if( a[j] < a[k] ) F_SWAP(a[k],a[j]);

        x=a[k];
        while (j>k & i<k) {
            do i++; while (a[i]<x);
            do j--; while (a[j]>x);

            F_SWAP(a[i],a[j]);
        }
        i++; j--;

        if (j<k) {
            while (a[i]<x) i++;
            l=i; j=m;
        }
        if (k<i) {
            while (x<a[j]) j--;
            m=j; i=l;
        }
    }
    return a[k];
}

In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.

在我在这里做的基准测试中,LefSelect比QuickSelect快20-30%。

#19


1  

Haskell Solution:

Haskell的解决方案:

kthElem index list = sort list !! index

withShape ~[]     []     = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys

sort []     = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
  where
   ls = filter (<  x)
   rs = filter (>= x)

This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.

这实现了中值解决方案的中位数,通过使用withShape方法来发现分区的大小,而不用实际计算它。

#20


1  

Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.

这里是一个随机快速选择的c++实现。这个想法是随机选取一个主元。为了实现随机分区,我们使用一个随机函数rand()在l和r之间生成索引,用最后一个元素随机生成的索引交换元素,最后调用使用last元素作为pivot的标准分区进程。

#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;

int randomPartition(int arr[], int l, int r);

// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method.  ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
    // If k is smaller than number of elements in array
    if (k > 0 && k <= r - l + 1)
    {
        // Partition the array around a random element and
        // get position of pivot element in sorted array
        int pos = randomPartition(arr, l, r);

        // If position is same as k
        if (pos-l == k-1)
            return arr[pos];
        if (pos-l > k-1)  // If position is more, recur for left subarray
            return kthSmallest(arr, l, pos-1, k);

        // Else recur for right subarray
        return kthSmallest(arr, pos+1, r, k-pos+l-1);
    }

    // If k is more than number of elements in array
    return INT_MAX;
}

void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Standard partition process of QuickSort().  It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
    int x = arr[r], i = l;
    for (int j = l; j <= r - 1; j++)
    {
        if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
        {
            swap(&arr[i], &arr[j]);
            i++;
        }
    }
    swap(&arr[i], &arr[r]); // swap the pivot
    return i;
}

// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
    int n = r-l+1;
    int pivot = rand() % n;
    swap(&arr[l + pivot], &arr[r]);
    return partition(arr, l, r);
}

// Driver program to test above methods
int main()
{
    int arr[] = {12, 3, 5, 7, 4, 19, 26};
    int n = sizeof(arr)/sizeof(arr[0]), k = 3;
    cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
    return 0;
}

The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)

上述解决方案最糟糕的时间复杂度仍然是O(n2)。在最坏的情况下,随机函数可能总是选择一个角元素。的预期时间复杂度以上随机QuickSelectΘ(n)

#21


1  

How about this kinda approach

这种方法怎么样?

Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)

保持一个长度为k的缓冲区和一个tmp_max,得到tmp_max是O(k)并且执行了n次,所以类似于O(kn)

如何在一个未排序的长度n (n)的数组中找到第k个最大的元素?

Is it right or am i missing something ?

是对的还是我错过了什么?

Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.

虽然它并没有超出快速选择和最坏的中值统计方法的案例,但是它很容易理解和实现。

#22


0  

This is an implementation in Javascript.

这是Javascript的一个实现。

If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).

如果您释放了不能修改数组的约束,您可以使用两个索引来防止使用额外的内存,以确定“当前分区”(在经典的快速排序样式中)。

function kthMax(a, k){
    var size = a.length;

    var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) 

    //Create an array with all element lower than the pivot and an array with all element higher than the pivot
    var i, lowerArray = [], upperArray = [];
    for (i = 0; i  < size; i++){
        var current = a[i];

        if (current < pivot) {
            lowerArray.push(current);
        } else if (current > pivot) {
            upperArray.push(current);
        }
    }

    //Which one should I continue with?
    if(k <= upperArray.length) {
        //Upper
        return kthMax(upperArray, k);
    } else {
        var newK = k - (size - lowerArray.length);

        if (newK > 0) {
            ///Lower
            return kthMax(lowerArray, newK);
        } else {
            //None ... it's the current pivot!
            return pivot;
        }   
    }
}  

If you want to test how it perform, you can use this variation:

如果你想测试它的表现,你可以使用这个变体:

    function kthMax (a, k, logging) {
         var comparisonCount = 0; //Number of comparison that the algorithm uses
         var memoryCount = 0;     //Number of integers in memory that the algorithm uses
         var _log = logging;

         if(k < 0 || k >= a.length) {
            if (_log) console.log ("k is out of range"); 
            return false;
         }      

         function _kthmax(a, k){
             var size = a.length;
             var pivot = a[parseInt(Math.random()*size)];
             if(_log) console.log("Inputs:", a,  "size="+size, "k="+k, "pivot="+pivot);

             // This should never happen. Just a nice check in this exercise
             // if you are playing with the code to avoid never ending recursion            
             if(typeof pivot === "undefined") {
                 if (_log) console.log ("Ops..."); 
                 return false;
             }

             var i, lowerArray = [], upperArray = [];
             for (i = 0; i  < size; i++){
                 var current = a[i];
                 if (current < pivot) {
                     comparisonCount += 1;
                     memoryCount++;
                     lowerArray.push(current);
                 } else if (current > pivot) {
                     comparisonCount += 2;
                     memoryCount++;
                     upperArray.push(current);
                 }
             }
             if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);

             if(k <= upperArray.length) {
                 comparisonCount += 1;
                 return _kthmax(upperArray, k);
             } else if (k > size - lowerArray.length) {
                 comparisonCount += 2;
                 return _kthmax(lowerArray, k - (size - lowerArray.length));
             } else {
                 comparisonCount += 2;
                 return pivot;
             }
     /* 
      * BTW, this is the logic for kthMin if we want to implement that... ;-)
      * 

             if(k <= lowerArray.length) {
                 return kthMin(lowerArray, k);
             } else if (k > size - upperArray.length) {
                 return kthMin(upperArray, k - (size - upperArray.length));
             } else 
                 return pivot;
     */            
         }

         var result = _kthmax(a, k);
         return {result: result, iterations: comparisonCount, memory: memoryCount};
     }

The rest of the code is just to create some playground:

剩下的代码只是创建一些游戏:

    function getRandomArray (n){
        var ar = [];
        for (var i = 0, l = n; i < l; i++) {
            ar.push(Math.round(Math.random() * l))
        }

        return ar;
    }

    //Create a random array of 50 numbers
    var ar = getRandomArray (50);   

Now, run you tests a few time. Because of the Math.random() it will produce every time different results:

现在,运行测试几次。由于Math.random(),它将每次产生不同的结果:

    kthMax(ar, 2, true);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 34, true);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);

If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.

如果你对它进行几次测试,你甚至可以从经验上看出,迭代次数平均为O(n) ~=常数* n,而k的值不影响算法。

#23


0  

I came up with this algorithm and seems to be O(n):

我提出了这个算法,似乎是O(n)

Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.

假设k=3,我们想要找到数组中第三大的项。我将创建三个变量,并将数组的每个项与这三个变量的最小值进行比较。如果数组项大于我们的最小值,我们将用项目值替换最小变量。我们继续同样的事情直到数组结束。我们三个变量的最小值是数组中第三大的项。

define variables a=0, b=0, c=0
iterate through the array items
    find minimum a,b,c
    if item > min then replace the min variable with item value
    continue until end of array
the minimum of a,b,c is our answer

And, to find Kth largest item we need K variables.

要找到第K个最大的项,我们需要K个变量。

Example: (k=3)

例如:(k = 3)

[1,2,4,1,7,3,9,5,6,2,9,8]

Final variable values:

a=7 (answer)
b=8
c=9

Can someone please review this and let me know what I am missing?

有人能回顾一下,让我知道我遗漏了什么吗?

#24


0  

Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):

这里是算法的实现建议(我也在这里用随机支点来实现):

public class Median {

    public static void main(String[] s) {

        int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
        System.out.println(selectK(test,8));

        /*
        int n = 100000000;
        int[] test = new int[n];
        for(int i=0; i<test.length; i++)
            test[i] = (int)(Math.random()*test.length);

        long start = System.currentTimeMillis();
        random_selectK(test, test.length/2);
        long end = System.currentTimeMillis();
        System.out.println(end - start);
        */
    }

    public static int random_selectK(int[] a, int k) {
        if(a.length <= 1)
            return a[0];

        int r = (int)(Math.random() * a.length);
        int p = a[r];

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return random_selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return random_selectK(temp,k-small-equal);
        }
    }

    public static int selectK(int[] a, int k) {
        if(a.length <= 5) {
            Arrays.sort(a);
            return a[k-1];
        }

        int p = median_of_medians(a);

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return selectK(temp,k-small-equal);
        }
    }

    private static int median_of_medians(int[] a) {
        int[] b = new int[a.length/5];
        int[] temp = new int[5];
        for(int i=0; i<b.length; i++) {
            for(int j=0; j<5; j++)
                temp[j] = a[5*i + j];
            Arrays.sort(temp);
            b[i] = temp[2];
        }

        return selectK(b, b.length/2 + 1);
    }
}

#25


0  

it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right

它类似于快速排序策略,在这里我们选择一个任意的主元,并将较小的元素移到它的左边,并将较大的元素移到右边。

    public static int kthElInUnsortedList(List<int> list, int k)
    {
        if (list.Count == 1)
            return list[0];

        List<int> left = new List<int>();
        List<int> right = new List<int>();

        int pivotIndex = list.Count / 2;
        int pivot = list[pivotIndex]; //arbitrary

        for (int i = 0; i < list.Count && i != pivotIndex; i++)
        {
            int currentEl = list[i];
            if (currentEl < pivot)
                left.Add(currentEl);
            else
                right.Add(currentEl);
        }

        if (k == left.Count + 1)
            return pivot;

        if (left.Count < k)
            return kthElInUnsortedList(right, k - left.Count - 1);
        else
            return kthElInUnsortedList(left, k);
    }

#26


0  

Go to the End of this link : ...........

走到这条链接的末尾:

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

#27


0  

  1. Have Priority queue created.
  2. 优先队列创建。
  3. Insert all the elements into heap.
  4. 将所有元素插入堆中。
  5. Call poll() k times.

    电话调查()k乘以。

    public static int getKthLargestElements(int[] arr)
    {
        PriorityQueue<Integer> pq =  new PriorityQueue<>((x , y) -> (y-x));
        //insert all the elements into heap
        for(int ele : arr)
           pq.offer(ele);
        // call poll() k times
        int i=0;
        while(i&lt;k)
         {
           int result = pq.poll();
         } 
       return result;        
    }
    

#28


0  

You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.

你可以在O(n)时间和常数空间中找到第k个最小的元素。如果我们考虑数组只针对整数。

The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range. We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest. Do the binary search until you reach the kth-smallest number

方法是对数组值的范围进行二分查找。如果我们有一个min_value和一个max_value都在整数范围内,我们可以在这个范围内进行二分查找。我们可以写一个比较器函数,它会告诉我们,任何值都是k -最小的或者小于k -最小的或者大于k -最小的。进行二分查找,直到到达第k个最小的数?

Here is the code for that

这是代码。

class Solution:

类解决方案:

def _iskthsmallest(self, A, val, k):
    less_count, equal_count = 0, 0
    for i in range(len(A)):
        if A[i] == val: equal_count += 1
        if A[i] < val: less_count += 1

    if less_count >= k: return 1
    if less_count + equal_count < k: return -1
    return 0

def kthsmallest_binary(self, A, min_val, max_val, k):
    if min_val == max_val:
        return min_val
    mid = (min_val + max_val)/2
    iskthsmallest = self._iskthsmallest(A, mid, k)
    if iskthsmallest == 0: return mid
    if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
    return self.kthsmallest_binary(A, mid+1, max_val, k)

# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
    if not A: return 0
    if k > len(A): return 0
    min_val, max_val = min(A), max(A)
    return self.kthsmallest_binary(A, min_val, max_val, k)

#29


0  

There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.

还有一种算法,优于快速选择算法。它被称为floyd -铆钉(FR)算法。

Original article: https://doi.org/10.1145/360680.360694

原文:https://doi.org/10.1145/360680.360694

Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf

版本下载:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf。

Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

*文章https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.

我尝试在c++中实现quickselect和FR算法。我还将它们与标准c++库实现std::nth_element(基本上是快速选择和heapselect的混合)进行比较。结果是快速选择,nth_element平均运行,但FR算法运行约。比他们快两倍。

Sample code that I used for FR algorithm:

我用于FR算法的样本代码:

template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
    if (n == 0)
        return *(std::min_element(data.begin(), data.end()));
    else if (n == data.size() - 1)
        return *(std::max_element(data.begin(), data.end()));
    else
        return _FRselect(data, 0, data.size() - 1, n);
}

template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
    size_t leftIdx = left;
    size_t rightIdx = right;

    while (rightIdx > leftIdx)
    {
        if (rightIdx - leftIdx > 600)
        {
            size_t range = rightIdx - leftIdx + 1;
            long long i = n - (long long)leftIdx + 1;
            long long z = log(range);
            long long s = 0.5 * exp(2 * z / 3);
            long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);

            size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
            size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);

            _FRselect(data, newLeft, newRight, n);
        }
        T t = data[n];
        size_t i = leftIdx;
        size_t j = rightIdx;
        // arrange pivot and right index
        std::swap(data[leftIdx], data[n]);
        if (data[rightIdx] > t)
            std::swap(data[rightIdx], data[leftIdx]);

        while (i < j)
        {
            std::swap(data[i], data[j]);
            ++i; --j;
            while (data[i] < t) ++i;
            while (data[j] > t) --j;
        }

        if (data[leftIdx] == t)
            std::swap(data[leftIdx], data[j]);
        else
        {
            ++j;
            std::swap(data[j], data[rightIdx]);
        }
        // adjust left and right towards the boundaries of the subset
        // containing the (k - left + 1)th smallest element
        if (j <= n)
            leftIdx = j + 1;
        if (n <= j)
            rightIdx = j - 1;
    }

    return data[leftIdx];
}

template <typename T>
int sgn(T val) {
    return (T(0) < val) - (val < T(0));
}

#30


-1  

What I would do is this:

我要做的是

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

您可以简单地将指针存储到链表中的第一个和最后一个元素。只有当更新到列表时,它们才会改变。

Update:

更新:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element