Gain of FFT speed when changing to CUDA 2.3

Before I upgraded from CUDA 2.3 I wrote a small FFT bench to see how the new release performs. I did not expect much difference, but I found that especially for larger FFT sizes there’s pretty much a gain (~factor of three) when using the newer CUDA version. Can anybody else confirm this behavior? Is the new FFT library running with more sophisticated algorithms? What boosts the performance that much?

Results are documented here.

CUFFT in 2.3 is faster and now supports also double precision.


Thanks for the reply. Could you (or somebody else) comment on what has been changed in the FFT libraries to perform that much better? Were the older libraries (delivered with CUDA 2.2 and below) not optimized for larger FFT blocks?