what's cufft batch?

riclas · May 13, 2008, 9:30pm

hi,

i have a 4096 samples array to apply FFT on it. batching the array will improve speed? is it like dividing the FFT in small DFTs and computes the whole FFT?

i don’t quite understand the use of the batch, and didn’t find explicit documentation on it… i think it might be two things, either:
divide one FFT calculation in parallel DFTs to speed up the process
calculate one FFT x times and average it for the result

both might be wrong ^^

anyone care to explain? maybe show me an explained example?

with batch=1 the FFTs take so much more time than IPP :\ i wanted to speed this up… (now it’s like 3 seconds IPP, 20 seconds CUFFT; 4096 samples C2C, 10000 1D FFTs, without magnitude calculation).
i don’t know if batching is the answer for it though…

thanks.

mattb3 · May 14, 2008, 2:09am

hi,

i have a 4096 samples array to apply FFT on it. batching the array will improve speed? is it like dividing the FFT in small DFTs and computes the whole FFT?

i don’t quite understand the use of the batch, and didn’t find explicit documentation on it… i think it might be two things, either:

divide one FFT calculation in parallel DFTs to speed up the process

calculate one FFT x times and average it for the result

both might be wrong ^^

anyone care to explain? maybe show me an explained example?

with batch=1 the FFTs take so much more time than IPP :\ i wanted to speed this up… (now it’s like 3 seconds IPP, 20 seconds CUFFT; 4096 samples C2C, 10000 1D FFTs, without magnitude calculation).

i don’t know if batching is the answer for it though…

thanks.

[snapback]376570[/snapback]

The batch feature is simply used to find the fft of multiple vectors in a single call. This is much more efficient then simply calling the fft over and over in a loop since some of the intermediate twiddle factors can be reused. In order to utilize the batch function for your application all of the 10000 4096 point inputs should be in one long continuous linear memory (40960000 elements total).

The plan would look like:

cufftPlan1d(&myPlan,4096,CUFFT_C2C,10000);

The execution would look like:

cufftExecC2C(myPlan, idata, odata, CUFFT_FORWARD);

I’m pretty sure that a G80 should beat a CPU for this many fft’s even including the host/device transfers. Good luck.

riclas · May 14, 2008, 2:30pm

thank you, your post was most informative.
something like this should be in the programming guide, it’s not explicit there how the batch works.

my problem was i was calling my function from the host 10000 times.
if i just put the cufftExec in a loop the results change very much.

100000 times gives 20 seconds in gpu and 34 in cpu, although the cpu calculates the magnitude of the values in each iteration and the gpu only once yet.
now i have to work on threading the magnitude function :)

Electro · May 26, 2008, 4:04pm

I try to compute the maximum 1k fft i can on a tesla card, but the maximum i find without “CUFFT_INVALID_VALUE” is much lower than 100000…
When i compare the time between CPU and my calculation on GPU i don’t much of a difference…
Obviously, there is something wrong in my code…

Any idea ?

Topic		Replies	Views
optimizing FFT calculation? CUDA Programming and Performance	8	6504	May 26, 2008
CUFFT (and kernel) questions CUDA Programming and Performance	1	2222	August 14, 2009
cufft Batch Mode Overhead Question ? FFT 1D c2c plan overhead of 512KB per FFT ? CUDA Programming and Performance	2	6240	June 26, 2009
Batched 1D FFT not faster than a loop for big images (1024x1024) GPU-Accelerated Libraries cuda	0	482	September 25, 2020
Performance of CuFFT 3.1 library CUDA Programming and Performance	0	3260	July 8, 2011
No Increase in Performance with Increased Batch Size for CUFFT CUDA Programming and Performance	1	1470	May 5, 2009
FFT algorithm implementation CUDA Programming and Performance	5	1484	December 31, 2009
CUFFT Batch Behavior for nfft > data length CUDA Programming and Performance	5	8929	August 18, 2011
Parralell FFT? CUDA Programming and Performance	5	2346	May 27, 2009
Do 1/2/3D FFTs work correctly on K80 if batch = 1? CUDA Programming and Performance	6	978	April 4, 2016

what's cufft batch?

Related topics