I have code that has been working fine in CUDA 8 and 9, that I just recompiled in CUDA 10.
Part of the code involves large 3D FFT’s, R2C and C2R with dimensions that are selected to have primes of 2,3,5. The code runs fine.
The major difference I noticed is that the FFT plan workSize has doubled from approximately 1 volume to 2 volumes. All processing is as 32 bit float. I need all the available CUDA memory I can so this is a disadvantage in CUDA 10.
Here is some code snippets:
cufftCreate(&m_pPlanFwd3D); cufftSetAutoAllocation(m_pPlanFwd3D, 0); cufftMakePlan3d(m_pPlanFwd3D, m_Z, m_Y, m_X, CUFFT_R2C, &workSize1); cufftGetSize(m_pPlanFwd3D, &workSize1);
Has anyone else noticed this, or am I doing something wrong? I don’t notice a major performance difference.