I’d like to find order statistics (e.g. the median values, or the 10th largest value) of a set of arrays.
It seems that neither Thrust, nor CUDPP implement these, although they include functions for sorting. However, sorting is probably not the most efficient approach, and there don’t seem to be any batch versions for doing many sorts in parallel.
Are there good CUDA libraries for (batch) order statistics?
batch sorting can be done in thrust with a sequence of back-to-back sort-by-key operations.
If you want to avoid sorting, sounds like what you want is a (batch) selection algorithm.
Yes. Is there one for CUDA?
I am not aware of any ready-to-use software downloads, but this is not a field I monitor closely. If I understand your use case correctly, there has been published work on such functionality, for example:
If you don’t find a high-performance batch order statistic implementation, a general-purpose segmented sort is here:
How many sub-arrays are you expecting to sort?
Note that sorting many small’ish sub-arrays in parallel can be extremely fast. My own parallel sort algorithm sorts 32K arrays of 1K 32-bit elements in about 3.5 ms. on a GTX 680.