Result of device-to-device cudaMemcpyAsync with stream synchronized


Suppose I have

cudaMemcpyAsync(dev2, dev1, N, cudaMemcpyDeviceToDevice, stream1)

where dev2 is a pointer on device 2, dev1 is a pointer on device 1 and stream1 is a stream on device 1.

After the cudaStreamSynchronize(), does it guarantee that

  1. the data has been copied to dev2, i.e., the whole copy has finished
    Or it simply guarantees that
  2. data has been copied from dev1 and dev1 can be reused, and the data is not necessarily in dev2


It guarantees that all previous operations issued to stream1 are complete. i.e. the whole copy has finished.

Waits for stream tasks to complete.


Blocks until stream has completed all operations.