I got strange problem when trying to sort signal array for cooley culey FFT. The program runs well if the number of points is < 2048. When the number of points increase to 4096 (chanage NUM_STAGES = 12 in sort.cu file), it doesn’t sort at all. The result can be checked by comparing cuda_result.txt, host_result.txt and orig_data.txt in the same directory. The program runs with VC Express 2005 with SDK 1.0 in EmuRelease configuration.