CUFFT source code


Thank you for the source code for CUFFT and CUBLAS. I am working on a project that requires me to modify the CUFFT source so that it runs on streams and also allows data overlap. It is a proof of concept to analyze whether the NVIDIA cards can handle the workload we need in our application.

I notice by running CUFFT code in the profiler that not all the source for CUFFT is provided. For example, there are routines such as c2c_radix2_mpsm and c2c_radix2_mpgm that show up in the profiler and not in the source release. Also routines such as c2c_twiddle and c2c_transpose are not included. The source for host code to determine the cufftPlan would be extemely useful also.

Is there any plan to release the full source for CUFFT anytime soon ?

Thank you,

Hello again,

Well, judging from the number of hits on this topic, it looks like plenty of people are interested in getting the full CUFFT source code; however, no one from NVIDIA has replied either way about whether there is any plan to release it.

Are there any NVIDIA folks out there who can comment on this please?

My 2 cents would be that releasing the source would help in
(1) Finding issues in the source
(2) Accelerating development for custom uses
(3) Help to get feedback to NVIDIA about features that we’d like to see
(4) Speed up proof of concept development which would help NVIDIA ultimately to sell its hardware

What does everyone else think?


Can you share your stream-enabled CUFFT code? I would find it useful for improvements to my CUDA project.


I don’t have any stream enabled CUFFT code at the moment. As the code released to us is incomplete, it would take quite a bit of work to code up the remaining routines myself.

Are you interested in also asking NVIDIA to release all the CUFFT code? I would encourage you to post it as a request. They have not made any comment either way about it to me.


I couldn’t agree with you more!

Some issues in the cufft will hamper the spread of CUDA especially in commercial applications.

Where can I obtain the source code? I can’t find it.


Don’t know if there has been any movement on getting access to the CUFFT source code.

I am developing an application that does a 2-d FFT (512x512 or 1024x1024) and then does an elementwise multiply with another 512x512 matrix together with some scaling before doing an inverse FFT. I’m doing round about 50,000 of these in a run.

Looking at the profiler, my application is spending the following time (using CUFFT v2.2):

[codebox]c2c_radix2_sp 57%

c2c_transpose 14%

Other stuff 29%[/codebox]

I’m wondering if I can reduce the c2c_transpose time as the matrices that I am multiplying by are static for the run so could be pre-processed to transpose them to reduce the time the FFT needs to spend transposing in the frequency domain.

Source code would be great to allow me to take a look alternatively, is there any way of unwrapping the fft call to reduce the amount of transposing it does at the end.

Cheers, Nick

Try 2.3, the transforms for powers of 2 are 3x faster.

Thanks for the tip. I would like to try 2.3. Problem is I am developing on a laptop (Dell D830) with an inbuilt Quadro NVS 135M. Can get v185.85 to recognise the card but v190.38 is having none of it. Makes development with v2.3 kind of hard! Maybe it’s time to invest in a new laptop…