Hi,
There’s good SDK example in CUDA 2.2 about radix sort in integer and floating point. But it seems that similar tricks can also be applied to DP, even if the underlying hardware does not support DP operations. Is there any available codes for doing this? Thanks.