Bitonic Sort with CUDA

Anyone knows bitonic sort with cuda or is there any reference that I can read and learn.

One of the CUDA sample codes implements bitonic sort and also indicates some references:

You may also be interested in this paper:

Which discusses a CUDA bitonic sort from the standpoint of an in-place algorithm.

Is there any sort methods based on GPU or CPU which can process 50 thousands disorder integers within
1 msec. I am not sure that the bitonic sort could finish sorting in 1 msec.
Many thanks for your help.

Yes, thrust sort (a radix sort, for ordinary data types) has performance over 500M Keys/s, which would translate to over 500K Keys/ms (32-bit quantities). This will depend to a large extent on which GPU you are running on:

(figure 26.2)

Thrust ships as part of the CUDA toolkit:

The fastest (easy-to-use, packaged) sort may be in CUB:

Note that these performance numbers do not include the time “cost” of transferring data to/from the GPU. If you include these times, the performance will be considerably lower.

Thanks a lot!!!