Radix sort in Double Precision


There’s good SDK example in CUDA 2.2 about radix sort in integer and floating point. But it seems that similar tricks can also be applied to DP, even if the underlying hardware does not support DP operations. Is there any available codes for doing this? Thanks.

Yep, Thrust sorts doubles with two 32-bit radix sorts using similar tricks. Here are some performance results.

Here are the relevant parts of the implementation: