I must perform many small 2D ffts. I can batch ffts, but only 1D ffts.
Ideally, I should be able to parallelize in the batch direction, even for
2D ffts, but cannot. Would it be possible to get some source code?
What other solutions are there? Thanks.
I am doing many small 2D FFTs in my code, too. The overhead of doing them separately was killing performance, so I wrote a batched 2D implementation on top of cuFFT. The source is posted in one of the CUDA subforums. Search for “batched” and you should be able to find it. I will try to find it and post a link.