Hi, today I’m facing a weird error:
i have several streams and several buffers
I’m doing something like this:
d_bufx_x is allocated with cudaMalloc, h_bufx_x with cudaHostAlloc
kernel<<<stream1>>>(d_buf1_1, d_buf1_2); cudamemcpyAsync(h_buf1_1, d_buf1_1, stream1); cudamemcpyAsync(h_buf1_2, d_buf1_2, stream1); kernel<<<stream2>>>(d_buf2_1, d_buf2_2); cudamemcpyAsync(h_buf2_1, d_buf2_1, stream2); cudamemcpyAsync(h_buf2_2, d_buf2_2, stream2); ... cudaDeviceSynchronize();
Now i would expect that all data is calculated and copied, but for any reason, h_buf2_2 is missing some integrity
(it only consists of 6 int values, and value  to  are set with memory from somewhere else)
I get neither cudaErrors nor any exception, everything seems to be fine…
For consideration: it works without problems with cudaMemcpy instead of cudaMemcpyAsync.
What am i doing wrong?