I’d like to find order statistics (e.g. the median values, or the 10th largest value) of a set of arrays.
It seems that neither Thrust, nor CUDPP implement these, although they include functions for sorting. However, sorting is probably not the most efficient approach, and there don’t seem to be any batch versions for doing many sorts in parallel.
Are there good CUDA libraries for (batch) order statistics?
I am not aware of any ready-to-use software downloads, but this is not a field I monitor closely. If I understand your use case correctly, there has been published work on such functionality, for example:
Note that sorting many small’ish sub-arrays in parallel can be extremely fast. My own parallel sort algorithm sorts 32K arrays of 1K 32-bit elements in about 3.5 ms. on a GTX 680.