unfortunately, the memory requirement of an FFT transform is not power of two because the library requires one additional cufftComplex.
I had no problems so far, but now I need to modify an FFT by using a kernel. How do you deal with such an issue in general?
For ffts larger than 512, because the block size is limited to 512 on my 8600 GT, the only option is to create a block size of 1,1,1 and a grid size of n,1,1.
I fear that this is totally worse for the performance because I’m unable to use a good block size.
How do you deal with such a thing in general? Would be nice if the result of an FFT transform would be power-of-two…