Small sort function in kernel Is there a function similar to qsort in cuda?


I need to sort a short array of values (typically 6) according to a custom comparison function inside a kernel - is there a function similar to qsort in cuda? It seems strange to have to code such a basic thing oneself… ;-)

Thanks in advance for your advice

Well, for six values even the poorest hand-written bubble sort algorithm (or even bogosort !) will be faster than a qsort.

Definitely a bubble sort if you have 6 values per thread.

If you had more than about 10, then a Shell sort could be faster.

Both bubble and Shell are well suited to per-thread sorts because their access pattern is fixed, so there’s no thread divergence, no pointer or array indirection, and fixed termination. All of those properties are GPU friendly. The only bad part of Shell and bubble is they’re O(n^2).
Bubble sort is stable, but Shell is not. This may or may not matter to your application.

I have my own app that uses 12 items per thread and bubble worked fine.

Thanks for you quick replies!

Bubble-sort it is… Although wikipedia says, that insertion sort might be faster…?
I still think that this is such a general thing that it should be supported directly as a cuda function.
Have a nice day!

I haven’t done any experimenting myself, but maybe selection sort would be faster because it doesn’t do as many writes as bubble sort would? It would also be easier to use more than one thread to help parallelize by doing the reduction to find the minimum remaining element in parallel.