Performance degradation in CUFFT 3.2


in my code, I make heavy use of out-of-place real to complex and
complex to real FFTs at many different sizes. Motivated by the
release highlights which announce a significantly improved FFT
performance, I updated the CUDA toolkit from 3.1 to 3.2.

However, I found a considerably reduced performance of CUFFT 3.2
compared to release 3.1. As a test, I’ve directly compared the
runtime in a toy program with array sizes up to 8192 elements and
confirmed the finding.

Interestingly, when I profile the application using the Nvidia
Visual Profiler, a slightly improved runtime is reported
using toolkit 3.2–in contrast to what I measure when I clock the
application by myself.

Furthermore, though less of a problem for me, I’ve noticed a
somewhat higher memory consumption.

Has anyone found a similar behavior?

GeForce GTX 480
Ubuntu, 2.6.32 kernel, 64 bit


Upgraded from 3.0 to 3.2 and found:

  1. cufft increased memory usage

  2. cufft throws first chance exceptions when loading

  3. cufftHandle with value NULL is a valid plan handle !!!

still no idea whether these are features or bugs :-)



well, I’m registering dissapointingly low performance values for a Tesla card, working with 3d FFTs. I’ve just started with CUDA 3.2, so that I do not know if this problem was there in old releases… Perhaps you can write some of your results? (dimension and recorded execution time) for comparison?