Hi,
Has anybody successfully synchronized streams using cudaThreadSynchronize()? I put it in each iteration and checked cudaGetLastError() after synchronization. No error returned but the result wasn’t correct.
Here is what I did:
- create stream[2] before starting a for loop
- in each iteration:
a. stream[0] transfer data n from host to device
b. stream[1] transfer data n-1 from device to host
c. cudaThreadSynchronize()
d. print cudaGetLastError() if the returned value is not cudaSuccess
Since I didn’t do anything to the data, I expected to get the same data from device, but it didn’t.
Could someone please think of a reason to explain this problem? Many thanks!!