2048 points limitation in FFT sort?


I got strange problem when trying to sort signal array for cooley culey FFT. The program runs well if the number of points is < 2048. When the number of points increase to 4096 (chanage NUM_STAGES = 12 in sort.cu file), it doesn’t sort at all. The result can be checked by comparing cuda_result.txt, host_result.txt and orig_data.txt in the same directory. The program runs with VC Express 2005 with SDK 1.0 in EmuRelease configuration.

Your problem is that you are asking the kernel to allocate a buffer in shared memory of size “mem_size”:

cuda_sort<<<grid, threads, mem_size>>> (d_signal, SIGNAL_SIZE, NUM_STAGES, 1);

Shared memory on the G80 is 16Kb.
When NUM_STAGE=11, mem_size=2048*8bytes=16Kb, when NUM_STAGE=12, mem_size=32Kb.

You are not even using the shared memory you are allocating…

Just for my curiosity, why don’t you use the FFT functions from the CUFFT library?


Thanks for your answer. I got it working now. Just FYI, I am doing this for a course project, exploring how to code CUDA.