Newbie cuFFT user - how to design Discrete Cosine Transform (type 2)

Hello,

I would like to build a Discrete Cosine Transform (type 2) using cuFFT on Nvidia Telsa V100 DEVICE such that complex-to-complex forward FFT operation is employed. I am fairly new to cuFFT.

I am very early in the design phase of my DCST type 2 function, but briefly the algorithm follows (please be advised it is really ROUGH at this stage and assumes all arrays are of cuComplex type):

  1. Mirror input array [length: N] to a temp array [length: 2N]
  2. call forward FFT on temp array
  3. perform post processing on temp array, storing array into a resulting output array [length: N]

    for(int i = 0; i < N; ++i){
    float val = PI * i /(2*N);
    out[i].x = cos(val)*temp[i].x; // real part
    out[i].y = sin(val)*temp[i].y; // imaginary part
    }

All the above steps would be defined as kernels with DEVICE memory allocated prior to calling and used throughout, such that only pointers to DEVICE allocated arrays are passed to each kernel. However, not sure if this is an optimal path.

Any general ideas/hints would be greatly appreciated.

Thank you.