When generating a CUFFT plan with a high number of batches for large problems you will have plans occupying as much as 512 MB of memory when it's actually possible to occupy just 64 MB and get the same performance!
What you do is create a plan for NbBatches/8 and then simply call the cufftExecC2C(…) function in a loop while moving the pointer of the input and output data buffers.
For large problems this is just as fast and occupies by far less space…
Now my question, why doesn’t the CUFFT API handle this “under the hood” ?
If I remember correctly, this is a known problem and the CUFFT team was planning to fix it in an upcoming release.
ok, sounds great. Good to know!