Stream synchronization problem didn't synchronize but returned no error

Has anybody successfully synchronized streams using cudaThreadSynchronize()? I put it in each iteration and checked cudaGetLastError() after synchronization. No error returned but the result wasn’t correct.

Here is what I did:

  1. create stream[2] before starting a for loop
  2. in each iteration:
    a. stream[0] transfer data n from host to device
    b. stream[1] transfer data n-1 from device to host
    c. cudaThreadSynchronize()
    d. print cudaGetLastError() if the returned value is not cudaSuccess

Since I didn’t do anything to the data, I expected to get the same data from device, but it didn’t.

Could someone please think of a reason to explain this problem? Many thanks!!