I think I cannot do this, but I wanted to confirm:
I wanted to call cufftExecC2C (or any CUFFT really) within different
streams. I think the cufft calls are not callable from within kernel
routines, and I think that means I am out of luck. I had a simple
kernel defined like:
global void
fftKernel(cufftHandle fftPlan, cufftComplex *d_fftArrayA, cufftComplex *d_fftArrayB)
{
// now call our fft
CUFFT_SAFE_CALL(cufftExecC2C(fftPlan, d_fftArrayA, d_fftArrayB, CUFFT_FORWARD));
}
I get nvcc compile errors for this; I believe it is angry about trying to call
the cufft routine within another kernel – I recall reading somewhere that
cufft calls were esentially kernel calls in and of themselves.
So - is there a way to call a cufft routine within a stream? Thoughts on this would
be appreciated, thanks.
I am not sure it will work with streams, you may try to modify the source code. Also remember that in order to use async calls, the data need to be in pinned memory.
Has anyone got cufft running in a stream as yet? Also, when I run cufft through the profiler, I see kernel code with _mpsm and _mpgm extensions. I assume this has to do with shared mem and global mem access? I don’t see this in the CUFFT source code release… do we have the complete source code to get equivalent performance as running the host callable CUFFT routines?
I thought the batched ffts only changes the grid dimensions on the kernel code of cufft. Have you had success using batched ffts and launching cufft on a stream?
I have noticed that not all the CUFFT code is provided to us, so it will take some time to get it to work with streams if we have to modify the source code. I have posted a request to NVIDIA to see whether they have any advice or (preferably) make the entire CUFFT library source code available to us. No answer either way yet … :(