Implementation behind the 2D C2R FFT?

Hi

I’m trying to move a CUDA designed program to FPGA and it involved a lot of FFT of images.
In the documentation of cuFFT, it’s mentioned that for 2d R2C the output will be N1*(N2/2+1)(Complex) for N1N2(real) input because of it skips the Hermitian symmetry part; and N1N2(real) for N1*(N2/2+1)(Complex) input with 2d C2R.
So same as in FFTW, the first dimension ffts for 2d R2C are taking advantage of Hermitian symmetry and use half of the original points fft, and second dimension is normal ffts, this will give us the N1*(N2/2+1), I simulate this in Matlab correctly.
But I’m stuck with the inverse 2d C2R FFT, it takes N1*(N2/2+1) Complex number input so the horizontal ffts should be using the Hermitian symmetry reduction method and vertical ffts are the normal ffts, but no matter how I ordered the input, interchanged the fft methods, the Matlab simulation couldn’t get the same result as cuFFT.

My Matlab design is referencing this method: http://processors.wiki.ti.com/index.php/Efficient_FFT_Computation_of_Real_Input
It has some mistakes but I found it and proved it works perfectly for 1d ffts.

For proper input that’s transformed form 2d R2C fft, both cuFFT and my Matlab simulation can inverse transform it back, but when the input is random, the output are different (except for the the top row excluding first number).
It must be doing some optimizing tricks that assumes input data are properly transformed form 2d fft because when the input is random the output is actually wrong.
Why do I care? Because I’m doing some processing in the frequency domain so I’m not sure the input for C2R will be properly laid out.
Anyone know what’s going on behind the 2d C2R fft in cuFFT? Thank you!

2D R2C and C2R employ the hermitian symmetry along the row only (i.e. the columns are reduced)

study:

[url]cuFFT :: CUDA Toolkit Documentation

Also, not all organizations of Complex arrays have a corresponding Real representation, after inverse transform. The R2C → C2R case is always guaranteed to produce essentially the same result as the input, so if you are testing the inverse (C2R) transform case, I would only use C matrices that were performed by a proper R2C transform. I’m pretty sure any cases outside of that are undefined behavior.