parallel float sort how to use CUDA to sort float efficiently

Hello everyone,

I am the beginner of CUDA. I am looking for the sort algorithm. there are two in CUDA samples which I found.

bitonic: the number must power of 2, and the thread is equal number, some one ask for multi-block algorithm, but nobody answer;

radix sort: can use to arbitrary number, but the sample only use to integer, could it use to float.

could you please tell me the efficient sort method to use in parallel? I am very appreciate it if you give some example code.

I am also want know the efficient way to get the maximum and minimum in a large array.

Thanks for you attention!

You can use a radix sort to sort floating point numbers by rearranging the bits as described here:

There should be an example of this in the next release of the SDK.

Thank you!

by the way, do you know any quick method to find the maximum and minimum value in arbitrary unmber?

finding the minimum and maximum in a dataset is a classical example of a parallel reduction. Best to modify the reduction

SDK sample to suit your needs.