Parallel Algorithm For The Mode Of An Integer Array(number with the highest frequency)


I would like to implement a CUDA kernel for finding the mode of an integer array.
Could someone please point me to an algorithm for this?

It’s an interesting problem. The exact way to attack it crucially depends on the data itself… both the length of the list, the range of the integers, and the expected number of unique values.
For example, if the integers range only from 0 to 99, you’d probably have each block do a histogram with 100 bins, then do a final reduction to combine the histograms to tell you the exact distribution of the list (including the mode.)

If your integers ranged say over 32 bits but there are likely only 20 or 30 unique values, you could again use a histogram but use hashing to reduce the range into a countable number of bins.

If your integers ranged over 32 bits and they were scattered uniformly (meaning duplicates are rare, but you do want to find them) then it gets harder. One strategy is to take multiple passes through the data (likely done in parallel on a block by block basis) and each pass is responsible for a range of values, using a hash table counter to find duplicates.

And a final method… it’s inefficient and therefore less interesting theoretically… but it’s easy to code with libraries.

The method is simple: Use Thrust or CUDPP to sort the array of integers. Then just scan through it in order and keep track of which integer has the highest number of repeats.

Thanks SPWorley.
I too came to the sort on the GPU, then brute force scan on the CPU solution.
Since this is essentially a reduction problem, I was hoping there might be a specific algorithm (high on the wow scale).

After the sort, there are parallel ways to find the mode without resorting to copying to the cpu and scanning through the list. One possibility is do to do a reduce_by_key with the keys as the sorted sequence and the values all 1s.

keys = [1 1 2 5 4 4 4 6 6 3]

values = [1 1 1 1 1 1 1 1 1 1]


out_keys = [1 2 5 4 6 3]

out_vals = [2 1 1 3 2 1]

Then find the max of out_vals array and lookup the key in corresponding position of the out_keys array. If there are multiple values that appear the same number of times then you would have to do a bit more work.

In the first case the radix sort would be extremely fast since it would only have to do a very small number of passes, so it might not be that much slower than a binning approach. I’m not sure what the big-O complexity of reduce_by_key is, but the sort and max phases are O(N), which is what your other approaches are also, right?