Obtaining a non-peer GPU pointer from different processes

My program runs a single process per GPU in the system, and I need an interface for GPU-to-GPU memcpy. I’d like to leverage P2P access when possible, and when it is not, it’s okay to fallback into going through CPU memory, though I’d like to use DtoD memcpy rather than doing DtoH + HtoD explicitly. This requires me to obtain a device pointer (source or destination of the memcpy) from a different process running a non-peer GPU. cuIpcOpenMemHandle() works for this when I access to a peer GPU, but it returns CUDA_ERROR_PEER_ACCESS_UNSUPPORTED=217 when the GPU is non-peer.

  1. Is it impossible to get a non-peer GPU pointer across processes (through IPC)? I.e. is peer-accessibility required to use cuIpcOpenMemHandle()?

  2. Minor question: can cuIpcOpenMemHandle() return 217? The API doc doesn’t say that it may return 217.