Hi

I’m trying to move a CUDA designed program to FPGA and it involved a lot of FFT of images.

In the documentation of cuFFT, it’s mentioned that for 2d R2C the output will be N1*(N2/2+1)(Complex) for N1*N2(real) input because of it skips the Hermitian symmetry part; and N1*N2(real) for N1*(N2/2+1)(Complex) input with 2d C2R.

So same as in FFTW, the first dimension ffts for 2d R2C are taking advantage of Hermitian symmetry and use half of the original points fft, and second dimension is normal ffts, this will give us the N1*(N2/2+1), I simulate this in Matlab correctly.

But I’m stuck with the inverse 2d C2R FFT, it takes N1*(N2/2+1) Complex number input so the horizontal ffts should be using the Hermitian symmetry reduction method and vertical ffts are the normal ffts, but no matter how I ordered the input, interchanged the fft methods, the Matlab simulation couldn’t get the same result as cuFFT.

My Matlab design is referencing this method: http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input

It has some mistakes but I found it and proved it works perfectly for 1d ffts.

For proper input that’s transformed form 2d R2C fft, both cuFFT and my Matlab simulation can inverse transform it back, but when the input is random, the output are different (except for the the top row excluding first number).

It must be doing some optimizing tricks that assumes input data are properly transformed form 2d fft because when the input is random the output is actually wrong.

Why do I care? Because I’m doing some processing in the frequency domain so I’m not sure the input for C2R will be properly laid out.

Anyone know what’s going on behind the 2d C2R fft in cuFFT? Thank you!