Hi.

I’m having some problems when making a CUDA fft2 implementation for MATLAB. In the MATLAB docs, they say that when inputing m and n along with a matrix, the matrix is zero-padded/truncated so it’s m-by-n large before doing the fft2. My code successfully truncates/pads the matrix, but after running the 2d fft, I get only the first element right, and the other elements in the matrix wrong. Can anyone tell me why this happens? After truncating/padding the matrix, all I do is:

```
cufftHandle plan;
cufftPlan2d(&plan, M, N, CUFFT_C2C);
cufftExecC2C(plan, dmatrix, dmatrix, CUFFT_FORWARD);
cufftDestroy(plan);
```

but this gives the wrong result. But if I skipp the truncation/padding, I get the right result. Are there any difference between how matlab and cufft calculates the 2d ffts? I’m really confused now.

EDIT: After some more testing, I see that this happens only if M != N. Do the MATLAB fft transpose the matrix before doing 2d fft, so I have to do that in my code?