Are the cufft libraries calls asynchronueous?

Hello,

Are the cufft calls asynchroneuous? I have an iterative process which a function update

__host__ void update(cufftDoubleReal *dbbff,cufftDoubleReal *dppsi, double *ddqq,cufftDoubleReal *hbbff,cufftDoubleReal *hppsi,int llx,int lly,int totsize,int totsize_pad,int totsize_invspa,double rr,const double q0,double ddt,cufftHandle pprc,cufftHandle ppcr,dim3 ggrid, dim3 tthreads)
{
    nonlinterm < < < ggrid,tthreads > > > (dbbff,dppsi, totsize_pad); 
    cufftExecD2Z(pprc,dbbff,(cufftDoubleComplex*)dbbff); 
    cufftExecD2Z(pprc,dppsi,(cufftDoubleComplex*)dppsi);
    kupdt < < < ggrid,tthreads > > > ((cufftDoubleComplex*)dppsi,(cufftDoubleComplex*)dbbff,ddqq,totsize_invspa,totsize,rr,ddt,q0);
    cufftExecZ2D(ppcr,(cufftDoubleComplex*)dppsi,dppsi);
}

is calles for nout times. At this point I copy data to cpu to check for convergence. I noticed that calling the function nout*10 takes the same time as (nout/10)x100. This lets me believes that the cufft calls are blocking. Is this right? Can the calls be made asynchrouneuous. for iterative processes I think there might be improvement in performance.
(I tried to look in the manual, but there there are only mentioned the streams. I have only one stream)

Here is the call sequence in the main function:

for(int sss=1;sss<=nsteps;sss++)
    {
    
    for(int n=1;n < = nend;n++)
    {    
update(dbff,dpsi,dqq,hbff,hpsi,lx,ly,totsize,totsize_pad,totsize_invspa,r,q0,dt,prc,pcr,grid,threads); 
    }
    
    CUDA_CHECK( cudaMemcpy(hpsi, dpsi, sizeof(double)*totsize_pad,cudaMemcpyDeviceToHost) );    
   
// Start of energy function    
   ene[sss]=energy(dbff,dpsi,dqq,hbff,hpsi,lx,ly,totsize,totsize_pad,totsize_invspa,r,q0,prc,pcr,grid,threads);
   pFile=fopen("enespeed.txt","a");
   printf("%26.20lf %26.20lf\n",ene[sss],(r+pow(q0,4))*pm*pm/2.0+pm*pm*pm*pm/4.0); 
   fprintf(pFile,"%d %26.20lf %26.20lf\n",sss,ene[sss],(r+pow(q0,4))*pm*pm/2.0+pm*pm*pm*pm/4.0); 
   fclose(pFile);    
// some simpel cpu stuff + saving data
   printf("%d %d\n",sss,cpart);

is caleld for nout times,