How to use cufft ... Have problem calling cufft functions in kernel

Hello Im trying to do parallel computing using global kernel and put cufft functions in that. However, it seems like
cufft functions are to be called on host not on device. Is there any way I can use parallel computing and cufft function as well?
Can I call it in global function???


Well how can I use cufft then?? so the structure is like

for loop {

for loop {


cufft functions…

for loop {



which the for loop is inside the kernel… I have thought of using device but still I have no idea how to

deal with cufft functions. Does anyone have idea??

int main(){

  <init cuda>

  <init cufft>

  <init a single stream - perform next calls in the stream using asynchronous calls>

  for loop{

   <run your looping-kernel>

   <call cufft functions>

   <run your second looping-kernel>


  <check result!>

  <clean up!>

  return 0;


Use of the stream should reduce/remove a gap between calls and you should have your gpu working the whole time.

Hope you’ll solve it.


I have encountered the same situation with MasterKitten.
How can we implement the fft modules in for loop effectively by cuda.

Can you describe your problem in more detail? What about the proposed solution of Glupol?