Top K elements selection

Hello Community,

Suppose I have an array of size 1 million, and the goal is to select the top 1000 largest elements in the array. I am thinking if there is any efficient way to solve it in GPU.

I’ve searched the forum and make a summary on what I find.

  1. sort and selection. It overkill the problem since only 1000 out of 1 million elements return.
  2. make a heap data structure in GPU, and heapify in parallel.

Solution 1 is not efficient. Regarding solution 2, my concern is how to make a heap data structure in
GPU, and heapify in parallel might also have race condition.

Any input is welcome.
Thank you.

personally, i would consider:

finding the minimum, min, within the array of 1m elements
select a pivot equal to min + value and count the number of elements less than the pivot
iterate on the pivot value until i have the number of elements less than the pivot, greater than (but not by too much) or equal to 1000
gather the elements less than the pivot

and this is an algorithm that can easily be parallelized across multiple thread blocks - i.e. you can use multiple thread blocks to simultaneously work on the problem/ algorithm

google ‘k-selection GPU’ or ‘k-select GPU’

Reveals papers and cuda code like:

1 Like

Thank you for the kind reply.

Here is the summary for who will have the same problem in the future.

  1. sort and select. If the array size is very big, and it will overkill the problem.
  2. heap data structure. updating the heap in parallel needs atomic function, essentially,
    the updating operations would be serialized.
  3. Randomized Selection on the GPU.
  4. radixSelect and bucketSelect.

Credit go to little_jimmy and HannesF99.

Thank you.