Bitonic Sort with CUDA

Anyone knows bitonic sort with cuda or is there any reference that I can read and learn.

One of the CUDA sample codes implements bitonic sort and also indicates some references:

http://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-sorting-networks

You may also be interested in this paper:

http://www.informatik.uni-kiel.de/fileadmin/arbeitsgruppen/comsys/files/public/ppam09.pdf

Which discusses a CUDA bitonic sort from the standpoint of an in-place algorithm.

Is there any sort methods based on GPU or CPU which can process 50 thousands disorder integers within
1 msec. I am not sure that the bitonic sort could finish sorting in 1 msec.
Many thanks for your help.

Yes, thrust sort (a radix sort, for ordinary data types) has performance over 500M Keys/s, which would translate to over 500K Keys/ms (32-bit quantities). This will depend to a large extent on which GPU you are running on:

http://sbel.wisc.edu/Courses/ME964/Literature/thrustGPUgems2011.pdf

(figure 26.2)

Thrust ships as part of the CUDA toolkit:

https://github.com/thrust/thrust/wiki/Quick-Start-Guide

The fastest (easy-to-use, packaged) sort may be in CUB:

http://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html

Note that these performance numbers do not include the time “cost” of transferring data to/from the GPU. If you include these times, the performance will be considerably lower.

Thanks a lot!!!