Hi
I am using Cufft library version 3.1 and comparing CUFFT 1D running on NVidia GTX260 (216)with MATLAB FFT running on a CPU. I know CPU is better for small fft size (<1024) but with using BATCHED FFT, CuFFT is expected to be better with any fft size.
I use power of two sizes and GPUmat wrapper to CuFFT API. I always getting FFT on CPU is much better than CuFFT on GPU for fft size below 2048 even with using Batch FFT (total number of points is fixed at 8 Million)!(see attached figures). Can anyone explian or suggest me anything to do.
Here is the Matlab code:
function d_A = GPUfft(d_A,d_B,N,Batch)
fftType = cufftType;
fftDir = cufftTransformDirections;
% FFT plan
plan = 0;
[status, plan] = cufftPlan1d(plan, N, fftType.CUFFT_C2C, Batch);
cufftCheckStatus(status, 'Error in cufftPlan1D');
% Run GPU FFT
[status] = cufftExecC2C(plan, getPtr(d_A), getPtr(d_B), fftDir.CUFFT_FORWARD);
cufftCheckStatus(status, 'Error in cufftExecC2C');
% Run GPU IFFT
[status] = cufftExecC2C(plan, getPtr(d_B), getPtr(d_A), fftDir.CUFFT_INVERSE);
cufftCheckStatus(status, 'Error in cufftExecC2C');
% results should be scaled by 1/N if compared to CPU
% h_B = 1/N.*single(d_A);
[status] = cufftDestroy(plan);
cufftCheckStatus(status, 'Error in cuffDestroyPlan');
end
Results:
=============================
GPU time for 2 = 0.204395
CPU timefor 2 = 0.000090
GPU time for 4 = 0.014395
CPU timefor 4 = 0.000029
GPU time for 8 = 0.014310
CPU timefor 8 = 0.000027
GPU time for 16 = 0.013884
CPU timefor 16 = 0.000021
GPU time for 32 = 0.014274
CPU timefor 32 = 0.000031
GPU time for 64 = 0.014726
CPU timefor 64 = 0.000069
GPU time for 128 = 0.014784
CPU timefor 128 = 0.000181
GPU time for 256 = 0.015566
CPU timefor 256 = 0.000527
GPU time for 512 = 0.014721
CPU timefor 512 = 0.001977
GPU time for 1024 = 0.017689
CPU timefor 1024 = 0.007305
GPU time for 2048 = 0.020455
CPU timefor 2048 = 0.025084
GPU time for 4096 = 0.021909
CPU timefor 4096 = 0.103657
GPU time for 8192 = 0.026931
CPU timefor 8192 = 0.465617
GPU time for 16384 = 0.032093
CPU timefor 16384 = 2.494288
============================================