For a CUDA application I’m in need for a fast sorting algorithm to sort coordinates lexicographically. I know CUDPP provides a sorter, in the form of a RADIX sort combined with a merge sort.
Previous GPGPU sorters were generally based on sorting networks, which have complexity Nlog^2 N and quite some overhead.
What complexity has the CUDPP sorter? What are the advantages and disadvantages compared to a sorting network?