the existing FFT2D image convolver sample only works on a single channel. As I understand it, RGB images are traditionally FFT’ed once per channel, but this seems very inefficient. What is the fastest way to FFT-filter 3 or 4 component images with CUDA?
For example, can a channel be processed in the imaginary part somehow? I’ve heard this can be done but it’s not working for me. Is there a trick, eg. by massaging the input/output with some kind of offset or inversion?
An optimal RGB(A) convolver SDK sample would be great, for either mono or RGB(A) filter kernels.