Has anyone had luck calling the cufft_c2c_radix4 kernel from the source posted a few weeks ago? For instance, if I try to compute 512, 1024-point FFTs like this:

    cufftStride inStride;
    inStride.ibStride = 1024;
    inStride.ieStride = 1;
    inStride.obStride = 1024;
    inStride.oeStride = 1;
    cufft_c2c_radix4<<<512,1024/4,1024*sizeof(cData)>>>(1024, TP/1024.0, 10, inbuf, outbuf, -1, inStride);

only the ‘even’ samples in outbuf appear to be correct. Has anyone else noticed this? I tried it both on an 8600 and a C870, with the same results. Thanks,


I just figured out what was wrong today. Look for the following line in the radix4 code.

c = base - 1;

Change it to

c = base - 2;

That is right !

Thank you,


Can someone post the full code for this, i am new to cuda and am trying to do similar using the cufft_c2c_radix2, but i cant seem to get the memory set up right?