I am wondering that is there a way to give hint about data movement to cuda unified memory runtime ? For example in given workflow, I init and compute data in gpu. When computation is finished (after for loop), I will read data on the host. What would be cuda uma runtime behaviour ?
My workflow is
Alloc()
Init<<< >>> ()
for(0 … N)
Compute<<< >>>()
cudaDeviceSync()
Read_host()