I need to sort a short array of values (typically 6) according to a custom comparison function inside a kernel - is there a function similar to qsort in cuda? It seems strange to have to code such a basic thing oneself… ;-)
Definitely a bubble sort if you have 6 values per thread.
If you had more than about 10, then a Shell sort could be faster.
Both bubble and Shell are well suited to per-thread sorts because their access pattern is fixed, so there’s no thread divergence, no pointer or array indirection, and fixed termination. All of those properties are GPU friendly. The only bad part of Shell and bubble is they’re O(n^2).
Bubble sort is stable, but Shell is not. This may or may not matter to your application.
I have my own app that uses 12 items per thread and bubble worked fine.
Bubble-sort it is… Although wikipedia says, that insertion sort might be faster…?
I still think that this is such a general thing that it should be supported directly as a cuda function.
Have a nice day!
I haven’t done any experimenting myself, but maybe selection sort would be faster because it doesn’t do as many writes as bubble sort would? It would also be easier to use more than one thread to help parallelize by doing the reduction to find the minimum remaining element in parallel.