I write a program on below ,to let it easy to understand here is the pseudo code
do operation use GPU parallel threads and blocks
to sent data from CPU memory to GPU memory.
call GPU function and do operation.
cudaDeviceSynchronize(); //this seem can let CPU wait for GPU,but it doesn’t work
to sent data from GPU memory to CPU memory.
I think the problem that make the result wrong is because after CPU call GPU function ,the GPU start to run,
and the CPU will not wait GPU complete its operation, so the data CPU get from GPU will be wrong data(because GPU has not complete its work),how can I avoid this error,thank you for your help.