I’m using the CUDA Driver API to run a multi-GPU application. Currently, I have 2 contexts, one for each device. I’m doing a simple transfer from the first GPU to the second GPU, and then I transfer data from the second GPU to the host. When I allocate memory for each GPU (via cuMemAlloc), I make sure to call cuCtxPushCurrent/cuCtxPopCurrent to get the right context. When I schedule the transfer between the first GPU to the second one, I call the memcopy in between the corresponding cuCtxPushCurrent/cuCtxPopCurrent calls. However, to transfer data from the second GPU, do I need to use cuCtxPushCurrent/cuCtxPopCurrent to get the right context? I tried removing them and the application still worked. Are memcopies independent of the context of the involved buffers?
Here is the pseudo code of what I’m talking about?
//Transfer from GPU1 to GPU2
cuCtxPushCurrent(cudaContexts[0]);
// Do work and transfer data
cuCtxPopCurrent(cudaContexts[0]);
//Transfer from GPU2 to host
cuCtxPushCurrent(cudaContexts[1]);
cuMemcpyDtoH(hostBuffer, devBuffer, bufSize)
cuCtxPopCurrent(cudaContexts[1]);
If I remove cuCtxPushCurrent/cuCtxPopCurrent with cudaContexts[1], the program still works. Is this because of UVA?
Thanks!