CUFFT out-of-place transform destroys input?

I am using the CUFFT library to do what should be a simple FFT, IFFT.
Here’s the gist of it:

cufftExecR2C(forwardFFTPlan, d_img, d_fft_img);
print(d_fft_img);
cufftExecC2R(inverseFFTPlan, d_fft_img, d_img);
print(d_fft_img);

What I am noticing is that d_fft_img is different in the second print call, leading me to to believe that cufft is destroying its input data. However, I can’t find this mentioned anywhere in the documentation, and it seems like the kind of thing you would want to document. I scoured the forums and found only one other person mentioning the same issue, without a definitive answer.

Does anyone know for certain, one way or another, if the CUFFT functions (namely, C2R) ruin their input data?
Or can you offer any other plausible explanation as to why this would be happening?

Thanks in advance for any help, suggestions, or ideas.

[edit]

First, thanks to the people who have commented so far.
A few clarifications I should have made the first time around:

  • the results of each transform are correct
  • there’s nothing before, after, or in between these two calls - something else modifying the data is a non-issue.
  • “print” is a simplification here - I’m not accidentally printing the address of the pointer, or cpu mem instead of gpu mem, or anything like that : )
  • again, thanks for any input

(Sorry, my mistake - I should’ve checked the manual 1st - padding only applies to in place transforms, which you don’t use - plus you say it works the 1st time)

Since your 2nd call is a C2R transform, it’s most likely your data wasn’t in the expected format (the manual doesn’t make this very clear). For R2C and C2R,
you need to have the correct #elements in each row:

C2R:
n complex/ row => 2 n - 2 reals / row

R2C
n reals / row => n / 2 + 1 complex / row

The Fourier transform doesn’t change the degrees of freedom, so the “padding” is probably to simplify addressing. Intel IPP’s complex transform format doesn’t have any padding.

I am sure that CUDA FFT doesn’t alter the input data, I have use it extensively and I had no such problem.
So the bug in your is elsewhere (I think).

Hello,

This is a very old post, but it affects me. I similar behaviour. My code is a set of iterations of the same task.

I start with a matrix psi and its fourier transform.

I apply nsteps the following algorithm psik

  1. calculate with a kernel a matric nt[i]=psi[i]^3
  2. take the FT of the matrix nt → ntk
  3. update in k space psik[i]=psik[i]*f1[i]+ntk[i]*fk[i]
  4. make IFT of psik–>psi to obtain the new result

Now we go back to step 1. Since nothing is done in between the following matrices should survive
psi and psik, If I go back to step 1 it should have the psuk matrix in the memory, but it gets lost somehow. So the algorith only works if there is another step added
5) FT of psi to psik

This is the code with the steps:

  1. nonlinterm<<<grid,threads>>>(dbff,dpsi, lx,ly,lz,awu,bwu);

  2. cufftExecR2C(prc,dbff,dbffc);

  3. kupdt<<<grid,threads>>>(dkpsi,dbffc,dqq,dcoef,lx,ly,lz,dt);

  4. cufftExecC2R(pcr,dkpsi,dpsi);

//4) cufftExecR2C(prc,dpsi,dkpsi);

Hi all, I know this is an old post, but I had the same question and I comment the answer just for ones who might experience the same issue.
This is a natural behavior of cufftExecC2R which is documented:

The complex-to-real transform is implicitly inverse. For in-place complex-to-real FFTs where FFTW compatible output is selected (default padding mode), the input size is assumed to be ⌊N2⌋+1⌊N2⌋+1cufftComplex elements. Note that in-place complex-to-real FFTs may overwrite arbitrary imaginary input point values when non-unit input and output strides are chosen. Out-of-place complex-to-real FFT will always overwrite input buffer. For out-of-place transforms, input and output sizes match the logical transform non-redundant size ⌊N2⌋+1⌊N2⌋+1 and size NN, respectively.