2048 points limitation in FFT sort?


I got strange problem when trying to sort signal array for cooley culey FFT. The program runs well if the number of points is < 2048. When the number of points increase to 4096 (chanage NUM_STAGES = 12 in sort.cu file), it doesn’t sort at all. The result can be checked by comparing cuda_result.txt, host_result.txt and orig_data.txt in the same directory. The program runs with VC Express 2005 with SDK 1.0 in EmuRelease configuration.

SORT.zip (28.3 KB)

Your problem is that you are asking the kernel to allocate a buffer in shared memory of size “mem_size”:

cuda_sort<<<grid, threads, mem_size>>> (d_signal, SIGNAL_SIZE, NUM_STAGES, 1);

Shared memory on the G80 is 16Kb.
When NUM_STAGE=11, mem_size=2048*8bytes=16Kb, when NUM_STAGE=12, mem_size=32Kb.

You are not even using the shared memory you are allocating…

Just for my curiosity, why don’t you use the FFT functions from the CUFFT library?


Thanks for your answer. I got it working now. Just FYI, I am doing this for a course project, exploring how to code CUDA.