The CUDA FFT works the same as the FFTW library, so one way to do a Real to Real is to do a Real-to-Complex transform, which yields a 2D complex array (1/2 the size of the real time data) – where the 1st dimension holds the cos() values and the 2nd dimension holds the sin() values.

```
// Load real data on to the device
float *Td;
int size = ary_sz * sizeof(float);
CUDA_SAFE_CALL(cudaMalloc((void**)&Td, size));
// Allocate device memory for signal
cufftComplex *d_signal;
int mem_size = sizeof(cufftComplex)* (ary_sz/2);
CUDA_SAFE_CALL(cudaMalloc((void**)&d_signal, mem_size));
// CUFFT plan
cufftHandle planF, planI;
CUFFT_SAFE_CALL(cufftPlan1d(&planF, ary_sz, CUFFT_R2C, 1));
// Transform signal
CUFFT_SAFE_CALL(cufftExecR2C(planF, Td, d_signal));
```

The complex fft data is accessed by using:

d_signal.x [for cos() values]

d_signal.y [for sin() values]