the bitonic sort Demo in cuda-sdk can only sort 512 int elements at most. why?
512 int only take 4 * 512 = 2048 bytes men, and there’s 16384 bytes share mem.
there can be 65535 threads at most in a grid,but 512 bitonic sort only take 512 threads.
see the DeviceQuery result:
Device 0: “GeForce 8800 GT”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 536150016 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1512000 kilohertz