compute multiple fft in single gpu.

I am using cufft.h to comput fft. My requirement is to compute fft of four different sequence at the same time using streams. Earlier I used this library to compute fft of one sequence and it worked but now it is throwing errors when I am calling the cufft functions from the kernel. The error thrown is host function cannot be called from global function.

How can I compute the fft from the kernel when I dont want to copy data from device to host and then calculate fft and then copy it back to device for further calculation.

Please help
Thanks in advance

You can’t call CUFFT functions from a kernel.

If I wanted to compute multiple FFTs at once, the first thing I would do is investigate the batch capability in CUFFT.

If the data to be transformed is already on the device, there is no need to copy it to the host to use CUFFT. Pass pointers to the data directly to your CUFFT calls.