sorting on the GPU

alex.be · May 15, 2007, 1:39pm

I was curious about what algorithms people use here to sort data on the GPU. The bitonic sort example NVIDIA proposes in the template projects only works for n elems = n threads and as such has some serious limitations (max 512 elems to sort, and then only 16 registeres available per thread).

Has anybody tackled the problem yet? I have a big array of 1024-2048 elems I want to sort optimally (some variations of bin sort-bubble sort-selection sort I tried we’re far from satisfactory with performance /6 or /10 compared to bitonic)

I read about the adaptive bitonic sort but cannot find some meta code or implementation?

Simon_Green · May 15, 2007, 2:09pm

To sort larger arrays you would have each block sort a subset of the array in shared memory, and then merge the sorted sub-arrays. Unfortunately doing a merge efficiently in parallel is not easy.

Here’s a reference on adaptive bitonic sort:
[url=“Institute of Computer Science II”]Institute of Computer Science II

GPUsort uses a similar algorithm and you can download their implementation here:
[url=“GPUSORT”]http://gamma.cs.unc.edu/GPUSORT/index.html[/url]

We have an efficient CUDA implementation of radix sort which should be in a future release of the SDK.

Mu-Chi_Sung · May 20, 2007, 1:02pm

I did implement a “load-balanced” radix sort using CUDA. However it’s pretty slow since I am new to CUDA. But can u suggest what kind of radix sort will be used in the next release? Just want to make sure I am digging into the right way. Thanks!

Topic		Replies	Views
Which algorithm for sorting many instances of vectors of length 1024? GPU Sorting CUDA Programming and Performance	3	874	December 8, 2011
In-place Sorting Algorithms on CUDA? CUDA Programming and Performance	2	4216	February 1, 2011
How can i sort an array with CUDA? Who can tell me? CUDA Programming and Performance	5	7248	June 26, 2008
Bitnic sort, NVidia example CUDA Programming and Performance	0	502	June 2, 2011
Sort on GPU Need some help to use sorts... CUDA Programming and Performance	15	41897	June 19, 2008
Sort very small array in shared with 1 warps CUDA Programming and Performance	5	2150	October 12, 2021
Bitonic Sort with CUDA CUDA Setup and Installation	4	5072	August 5, 2014
Bitonic sort CUDA Programming and Performance	3	1746	August 1, 2013
GPU sort CUDA Programming and Performance	6	2938	September 5, 2008
Random memory access and += Advice needed CUDA Programming and Performance	4	2404	August 17, 2008

sorting on the GPU

Related topics