speed_fft on a

Hello, I have done the speed_fft test of the MATLAB Plug-in for Windows(Matlab_CUDA-1.1a).
Configuration :
CPU : Intel Xeon E5540 64 bits (Quad-Core)
Graphic Card : Quadro FX 3800
Matlab R2009a (mutlithreading disabled using the maxNumCompThreads(1) command)
Windows XP pro 64 bits
Visual C++ 2005
CUDA 2.2 Drivers

The results are surprising :


The CUDA results are the same than here : www.ll.mit.edu/HPEC/agendas/proc07/Day3/11_Fatica_Poster.ppt .
The CPU seems to be two times faster even if I have disabled multithreading (the results with the multithreading are the same).
What do you think about these results ? Can I optimize memory transferts from host to device ? Is it more interesting to use the CPU than CUDA for FFTs ?