Most efficient RGB FFT convolution? How best to FFT 3 component images.


the existing FFT2D image convolver sample only works on a single channel. As I understand it, RGB images are traditionally FFT’ed once per channel, but this seems very inefficient. What is the fastest way to FFT-filter 3 or 4 component images with CUDA?

For example, can a channel be processed in the imaginary part somehow? I’ve heard this can be done but it’s not working for me. Is there a trick, eg. by massaging the input/output with some kind of offset or inversion?

An optimal RGB(A) convolver SDK sample would be great, for either mono or RGB(A) filter kernels.

The Numerical recipes book discusses some of the methods used to do multiple FFTs at once, but as I vaguely recall their examples may only be 2-way FFTs. You may want to look there as a starting point until someone gives a more definitive answer for 3- or 4-way FFTs.


John Stone

Thanks, but I’m not looking to re-code the FFT (the math is beyond me). Does anyone have a sample, or some pointers on how to do it most efficiently with the current library?

As image filtering is such a common application, it’d also be great to see this in the next SDK.

You don’t have to recode, the methods described in the book mainly involve shuffling the input/output data in special ways before and after running the FFT. So they should work fine with cuFFT as-is.



The reshuffling is really messy for 2D transforms.
I would rearrange the data before/after calling the FFT transforms. If you take care of the memory coalescing, it should be pretty fast.