I’ve been investigating sorting algorithms and I’m going to be hitting the GPU memory limit, so I’d very much like to use an in-place algorithm if possible, but I’m concerned I’ll take a big hit in speed over using an out-of-place radix sort for example.
I’m primarily concerned about sorting (int, int) key-value pairs, and the data I will be sorting will already be reasonably ordered.
It seems the best in-place sorting currently uses the bitonic sort:
Are there any other sorting algorithms (or combinations of algorithms) that I should consider that has lower complexity than the bitonic sort, but also lower memory usage than the radix sort?
I mention this algorithm without any regard for its suitability to a CUDA implementation (I haven’t really thought too much about it). The smoothsort: Smoothsort - Wikipedia
It is worst case O(n log n) and in the best case of already ordered data it is O(n) with a smooth transition between the two as the input becomes more ordered. Its additional memory requirements are very low although it isn’t strictly in-place.
It is might be possible to implement efficiently with CUDA, but it is not likely to be easy.