Suppose I have
cudaMemcpyAsync(dev2, dev1, N, cudaMemcpyDeviceToDevice, stream1) cudaStreamSynchronize(stream1)
where dev2 is a pointer on device 2, dev1 is a pointer on device 1 and stream1 is a stream on device 1.
After the cudaStreamSynchronize(), does it guarantee that
- the data has been copied to dev2, i.e., the whole copy has finished
Or it simply guarantees that
- data has been copied from dev1 and dev1 can be reused, and the data is not necessarily in dev2