Asynchronous H2D transfer while kernel execution

Hi,

I am trying to develop an application where I need to refresh a buffer which the kernel is called with. I tried using asynch transfer with streams and double buffering… but I guess that its not possible to make a kernel use the refreshed state while in execution… I might be wrong but I feel async transfer helps only for D2H(device to host) transfers.

  1. Does async transfer work only for D2H or vice-versa as well ? If it does, can I do the following :

    Call_kernel<<<…,…,stream0>>>(d_a,d_b)
    while( cudaEventQuery(stop) == cudaErrorNotReady )
    {
    cudaMemcpyAsync(d_a,&h_a,sizeof(int),cudaMemcpyHostToDevice,stream0);
    }

  2. Is it possible to call cudaMemCpy() from CPU threads while kernel is in execution to do the same as above…

Please advise…

Thanks

Note that CUDA preserves the ordering of operations in the same CUDA stream. If you want your async copy to overlap with kernel execution, it needs to be on a different stream.

Agreed… Its my mistake to put same stream in the code…
So, if I do something like this :

Call_kernel<<<…,…,stream0>>>(d_a,d_b)

while( cudaEventQuery(stop) == cudaErrorNotReady )
{
cudaMemcpyAsync(d_a,&h_a,sizeof(int),cudaMemcpyHostToDevice, stream1);
}

How can the kernel see/access the fresh value of d_a after h_a is copied to it asynchronously ?
Do I have to use a separate buffer for this purpose ?

Thanks for the suggestions…