I’m going to use CUDA and CUFFT for some image processing functions. Before actually implementing this, I’m interested in the performance gain that will be possible with the use of my 8800GTX. For this purpose I’ve developed some simple benchmark tests, to compare CUFFT and FFTW. This produced a lot of hopeful results, CUFFT is faster in roughly 75% of the cases I tested.
One thing that I ran into however, was that CUFFT has a relatively large overhead in planning the FFT. In most of the cases where CUFFT is slower than FFTW this is because of the planning. In FFTW I used FFTW_ESTIMATE for planning, since it turned out to be the most efficient for me. Since the CUFFT library is based on FFTW, I’d expect that the type of planning would also be a parameter to the CUFFT function calls.
Are there any (undocumented) solutions or work-arounds to change the type of planning for CUFFT, or are there plans to support other types of FFT planning in the near future?
If yes, will there also be options like saving ‘wisdom’ for CUFFT?
Thanks in advance for all replies.