Is it necessary to do cudaDeviceSynchronize() after the cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost);


Kernel call

cudaMemcpy(host, device, size, cudaMemcpyDeviceToHost);

cudaDeviceSynchronize(); //Is it required here ??

I am asking because:-

  1. cudaMemcpy() is a blocking statement which is different than cudaMemcpyAsync(), where the device copy takes place without the interference of host.

I understand cudaMemcpyAsync() would definitely require cudaDeviceSynchronize(), but does the normal cudaMemcpy() also require sync.

If someone can clarify this it would be very helpful.
Till today I have not used the cudaDeviceSynchronize() after the normal cudaMemcpyAsync() and never got any error or issue for the default stream.

Is cudaDeviceSyncronize() mandatory after cudaMemcpy ?

no, cudaMemcpy() has a cudaDeviceSynchronize() operation built into it.