Hi, I am doing device to device cudaMemcpy. In my code I have created two threads.
Thread 1 - Copies data from device memroy to another device memory
Thread 2 - Operates on this copied memory.
On CPU program, how should I come to know that Thread-1 has completed the memcpy job before I instruct another thread to process on the latest data and not on the previously holded data / junk data in the buffer.
As per http://docs.nvidia.com/cuda/cuda-driver-api/api-sync-behavior.html, “For transfers from device memory to device memory, no host-side synchronization is performed.”, so can you please help me in understanding how to handle this situation. If you point out to any reference code, that would be helpful.