No Increase in Performance with Increased Batch Size for CUFFT

pmpp · May 4, 2009, 7:58pm

I have been benchmarking various FFTs, and I keep reading that CUFFT should return better results, as I increase the size of my batch, and that by batching the ffts, I should see a marked speedup, but I have yet to find any.

My FFTs are 12288 elements long, and I need to do 540 of them.

If I do one at a time (foolish implementation) I get about .14ms per fft.

If I batch them all together (correct implementation) I get about 77 ms.

The problem, is that I basically get no speedup by running them at the same time. Does anyone know why this could possibly be? I can provide the source code, if anyone wants it, but it seems to me to be trivial source code.

bgalbraith · May 5, 2009, 4:01am

I’m not entirely sure about the batch end of things, but I would recommend zero-padding the fft out to the next power of two (16384 in this case). I think it’s mentioned in the CUFFT guide that there is a specially optimized routine for 1d ffts that are powers of 2 and that CUFFT will not make this adjustment for you. I noticed a significant speedup when I did this for a very large convolution routine.

Byron

Topic		Replies	Views
what's cufft batch? CUDA Programming and Performance	3	13553	May 26, 2008
Benchmarking Paricular Sized CUFFT I have a CUFFT, and I can't seem to get anywhere near optimal CUDA Programming and Performance	0	2230	April 27, 2009
cufft 2.3 batched 1D fft of size 80 on 1GB relative low performance of batched cufft of size 80 on 1 CUDA Programming and Performance	0	2106	August 17, 2009
Simple FFT question CUDA Programming and Performance	2	1839	July 24, 2010
(Uncodumented?) limit of fftsize for 1d batch ffts Problem w/CUFFT docs and w/cufftPlan1d CUDA Programming and Performance	0	2651	October 19, 2008
CUFFT Issues CUDA Programming and Performance	2	870	February 4, 2011
cuFFT doubt. GPU-Accelerated Libraries	1	830	January 18, 2015
FFT algorithm implementation CUDA Programming and Performance	5	1524	December 31, 2009
CUFFT Batch Behavior for nfft > data length CUDA Programming and Performance	5	8976	August 18, 2011
CUFFT (and kernel) questions CUDA Programming and Performance	1	2247	August 14, 2009

No Increase in Performance with Increased Batch Size for CUFFT

Related topics