Btionic Sort in sdk Demo what's the bottleneck?

fzhsheng · May 16, 2008, 2:25am

the bitonic sort Demo in cuda-sdk can only sort 512 int elements at most. why?

512 int only take 4 * 512 = 2048 bytes men, and there’s 16384 bytes share mem.

there can be 65535 threads at most in a grid,but 512 bitonic sort only take 512 threads.

see the DeviceQuery result:

Device 0: “GeForce 8800 GT”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 536150016 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1512000 kilohertz

E.D_Riedijk · May 16, 2008, 3:57am

because the shared memory can only be accessed by the threads of 1 block. And the maximum amount of threads per block is 512.

alex_zip · June 26, 2008, 11:18am

Hi, i’m learning about CUDA and i’m interesting in Bitonic Sort. how can i solve this problem? i want to modify the bitonic sort example and sort more than 512 elements. Thanks.

Simon_Green · June 26, 2008, 6:34pm

One simple (although not very efficient) method is to use multiple passes of the per-block bitonic sort, and offset the start of the blocks by half the block size on the even passes. The offset allows communication across the block boundaries. This is a kind of hybrid odd-even bitonic sort.

Anyway, I would recommend reading about parallel sorting networks.

Topic		Replies	Views
Urgent help with Bitonic Sort Please help us with Bitonic Sort CUDA Programming and Performance	1	5486	March 18, 2008
How can i sort an array with CUDA? Who can tell me? CUDA Programming and Performance	5	7332	June 26, 2008
bitonic sort for arbitrary number of threads CUDA Programming and Performance	0	3110	April 17, 2007
uses too much local data / uses too much shared data CUDA Programming and Performance	1	1945	October 7, 2009
Bitonic Sort of the SDK : where can I found some explanations ? CUDA Programming and Performance	0	903	April 27, 2010
Bitnic sort, NVidia example CUDA Programming and Performance	0	522	June 2, 2011
Problem with my Bitonic Sort Program The device function gets stuck after a point CUDA Programming and Performance	1	2741	May 29, 2009
GTX460 Max of 1 Block? CUDA Programming and Performance	8	7200	July 3, 2011
sorting on the GPU CUDA Programming and Performance	2	21511	May 20, 2007
Use of bitonic sort of CUDA CUDA Programming and Performance	5	11954	March 19, 2010

Btionic Sort in sdk Demo what's the bottleneck?

Related topics