Hi!
I need to move some calculations to the GPU where I will compute a batch of 32 2D FFTs each having size 600 x 600. When I compare the performance of cufft with matlab gpu fft, then cufft is much! slower, typically a factor 10 (when I have removed all overhead from things like plan creation). How is this possible? Is this what to expect from cufft or is there any way to speed up cufft? (I would simple use matlabs fft if I could but when I mix it up with some iffts, sums and element wise multiplications it becomes super-slow in an unpredictable way.)
// The core of my code
mwSize ndim = mxGPUGetNumberOfDimensions(C_q);
mwSize const * dimSize = mxGPUGetDimensions(C_q);
// FFT test
cufftHandle plan;
int dd[3];
dd[1] = (int)dimSize[0];
dd[0] = (int)dimSize[1];
dd[2] = (int)dimSize[2];
int Nq = dd[2];
dimSize = mxGPUGetDimensions(Phi_j);
int L = dimSize[2];
// OBS quite some overhead here. Use default settings for the memory layout. Seem to give the right answer. Ok?
cufftPlanMany(&plan, 2, dd, NULL,0,0,NULL,0,0,CUFFT_C2C,Nq);
// Loop and sum over singular values
for (int i = 0; i<L; i++)
{
// Do the fft
cufftExecC2C(plan,(cufftComplex *) pS_q,(cufftComplex *) pC_q,CUFFT_FORWARD);
}
/ Anders