FFT GFLOPS results with nice graph! For different sizes and batches.


I’v been doing some FFT:s on a 4800 FX this morning and the results are:

The FFT is done using CUFFT with toolkit 2.3 for complex single precision, i.e. 8 bytes per element.

The measurement of time is the time taken for a transform followed by its inverse. The number of floating point operations are m * 2 * (5 * n * ln(n)) where m is the number of batches, n is the number of elements per batch, and the “2” comes from their being 2 transforms.

As can be seen for batches of 128 elements a piece or more, the number of GFLOPS attained are mostly a function of the number of elements, not the relation between the size of the batch or the length of the input vector.

Any comments? Do you think the results reasonable? Do you think I’m uncool seeing as I use Excel and not say Matlab?



Excel is for such n00bs! ;)

Damn you Jimmy!

Shoe flys across office (10p landing)

You’re eating lunch by yourself today!

That sort of makes sense, doesn’t it? The GPU is pretty much the embodiment of Gustafson’s Law, and this is pretty much what your results show. Larger input datasets in cuFFT means more blocks per FFT, which is usually good for GPU throughput.

And yes, Excel is unspeakably uncool (as well as ugly as hell and really unsuited to just about any serious scientific endeavour). Matlab is passé as well. Something like python matplotlib is what the cool kids are using these days.

EDIT: brain moving slightly faster than fingers in one spot.


Hmmm im gonna have lunch with some cool python programming kids instead! :D