Asyncronus call

Hi all,

I am new to cuda programming. I want some help about asynchronous call. Following is the code snap…

cudamemCpyAsync(cudaMem, deviceMem, sizeof(unsigned char) * Size, cudaMemcpyHostToDevice, stream)
cudaThreadSynchronize();

is the program wait for finishing the memcpy or i have to give cudaStreamSynchronize(stream) for wait

Interesting question as I’ve just come across an issue using “cudaThreadSynchronize()” after converting to use async memcpys for some of my memory transfers. As far I understood from the Programmers Guide, the cudaThreadSynchronize function makes sure “all streams” are finished before proceeding further. In my case, execution was without any errors, but the final output image was missing some data. By using “cudaStreamSynchronize(0)” instead, it appears to have fixed the issue. I’m only using the default stream 0 for everything.

[Edit/Update]: My mistake. The cudaThreadSynchronize is working as expected for me. I had inadvertently moved some code around with my recent changes and hadn’t fully tested. To answer sawan83’s question, cudaThreadSynchronize() should be fine as it blocks host execution for all streams until all outstanding tasks are complete. Alternatively, you can use cudaStreamSynchronize(stream) if you’re only interested in targetting a specific stream. The choice is yours.