I need the point symmetric FFT results

I use the cuda 1D_FFT (real to complex) function with the following parameters: n = 1024 sample points; batch = 512.

The 1D_FFT (real to complex) function calculate a 1D array with a complex data type. I get (n/2 + 1) real and (n/2 + 1) imaginary results. But I need the point symmetric FFT result. But I dont know how I create this array.

I think I must first create an empty array with 1024x512 complex data type. Then I copy the FFT result (n/2 + 1) in this array. Then I must copy the point symmetric results in this array. This step must be retry 512 times.

Can somebody help me pls.


What do you mean by point symmetric? Instead of real transforms you can define a complex array with imaginary parts zero. Then make the transform complex to complex. The result will be a complex array with n elements. The first n/2+1 elements will correspond for k from 0 to kmax, n/2+1 will correspond to both kmax and -kmax and the rest from n/2+1 to n will correspond to negative k from -kmax to kmax.

If I need the negative k from -kmax to kmax I must take a complex to complex FFT. At the moment I use only the real to complex FFT. How can I convert a 1D Array with real datatype (float) to a complex data type with imgaginary parts = 0?

Something like this:

global realtocomplex(cufftReal *in,cufftComplex *out)








call with:


This is a rough code you might have to adjust to work.

It is not really necessary to do complex to complex transform. If you do a real transform you get the values corresponding to positive k. If you have the value of the k component psik(k) then the component corresponding to the -k is just the complex conjugate of (psik(k)). So psik(-k)=complex_conjugate(psik(k)).

Thanks for the example. I think I can use the real to complex fft, but then I must insert the negative k from -kmax to kmax in the 1D array. I think teh example is the beste method.

Need the complex to complex fft more execution time as the real to complex fft?

Yes if you do a real to complex transform you need to construct a new array and insert the values of the negative k. The complex to complex transform will have automatically all the values but it will be 2 times slower than the real to complex transform. It is now up to you to choose between comfort and speed. For the beginning I would suggest to go with the simple way (complex to complex transform).

But I need the speed version because I have a great data stream.

  1. First I malloc a new array in the GPU memory.
  2. Then I calculate the FFT from 1024 sample points (batch = 512) .
  3. After that I copy 513 complex FFT results to the new array. Then I must copy the 511 negative k values to the array.
  4. Repeat step 3 512 times because batch = 512.

But I have no idea how I can copy the negative k values and the fft results to the new array

Can sombody post an example for the copy process?

What do you mean you have no idea how to copy the negative k values? You do not know how to do it in CUDA, or you do not know how to do it at all even in simple C?

Here simple code in C for batch=1

for(int i=0;i<513;i++)






the newvec array will be of size 1025 with newvec[0] corresponding to k=-kmax, newvec[512] corresponding to k=0 and newvec[1024] to k=kmax.

For batch >1

for(int offset=0;offset<batch; offset++)


for(int i=0;i<513;i++)







Is should be straight forward to convert it to CUDA and take into account you have complex numbers.

Maybe if you would tell more details we can see why do you need to have the full spectrum. Maybe I can suggest a way to go around the need to copy the redundant data.

I need the hole fft results for a cross correlation of 2 pictures. The cross correlation is based on a FFT algorithm. To solve the copy process in C is no problem. But at the moment I have problem to solve this in cuda.

Today Im sick. I will test it in 1 or 2 days