Bitonic Sort with CUDA

Anyone knows bitonic sort with cuda or is there any reference that I can read and learn.

One of the CUDA sample codes implements bitonic sort and also indicates some references:

[url]CUDA Samples :: CUDA Toolkit Documentation

You may also be interested in this paper:

[url]http://www.informatik.uni-kiel.de/fileadmin/arbeitsgruppen/comsys/files/public/ppam09.pdf[/url]

Which discusses a CUDA bitonic sort from the standpoint of an in-place algorithm.

Is there any sort methods based on GPU or CPU which can process 50 thousands disorder integers within
1 msec. I am not sure that the bitonic sort could finish sorting in 1 msec.
Many thanks for your help.

Yes, thrust sort (a radix sort, for ordinary data types) has performance over 500M Keys/s, which would translate to over 500K Keys/ms (32-bit quantities). This will depend to a large extent on which GPU you are running on:

[url]http://sbel.wisc.edu/Courses/ME964/Literature/thrustGPUgems2011.pdf[/url]

(figure 26.2)

Thrust ships as part of the CUDA toolkit:

[url]GitHub - NVIDIA/thrust: The C++ parallel algorithms library.

The fastest (easy-to-use, packaged) sort may be in CUB:

[url]http://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html[/url]

Note that these performance numbers do not include the time “cost” of transferring data to/from the GPU. If you include these times, the performance will be considerably lower.

Thanks a lot!!!