cudaThreadSyncronize and cudaMemcpy

Hi, all.
I have a question of timing about cudaThreadSyncronize and cudaMemcpy.

Some sample programs don’t use cudaThreadSyncronize before cudaMemcpy.
In this case, does cudaMemcpy function wait for finishing GPUkernnel ? or does it works asynchronous ?
Can I found specific about it in programming guid ?

thanks.

Yes, the memory copy will wait for the kernel to finish before running. Take a look at section 4.5.1.5 in the guide.

I found descriptions and found out that programmin guide 1.1’s section 4.5.1.5 has been extended.

Thanks a lot !