cudaMemcpyAsync not giving any answers using cudaMemcpyAsync function

Hi All,

I’m using CUDA streams for a program of mine and i came across an error related to cudaMemcpyAsync(). Here is a sample of my code which relates to this part.

calBoxVarienceKernal<<<blocksBoxVarienceKernal,threadsBoxVarienceKernal,0,c_stream>>>(gpu_idxBoxs,gpu_status,gpu_grid.ptr,gpu_grid.rows,varienceThresh, patt.cols);

cudaMemcpyAsync(host_status,gpu_status, patt.colssizeof(char), cudaMemcpyDeviceToHost,c_stream);

But when i check the ‘status’ array I’m not getting any results. But if I use ‘cudaMemcpy(status,gpu_status,patt.cols*sizeof(char),cudaMemcpyDeviceToHost);’ instead of ‘cudaMemcpyAsync’, for the same program, I’m getting results. Can someone please explain to me whats wrong?



 Usually this type of problem indicates a synchronization issue.  The final memcpy call you are making to copy host-to-host memory expects the cudaMemcpyAsync call to be complete before execution.  If you either make a cudaThreadSynchronize or cudaStreamSynchronize call (for the same stream the cudaMemcpyAsync call is using) before the memcpy call, I would think this would solve the problem.