How to share CUDA memory between two processes?

Q1) From what I understand, prior to Voltas architecture, if MPS is enabled, all process will share the device address space. So if one process allocates device memory and passes this address to another process via a pipe, then the kernel launched by second process should be able to access this device memory. Is this correct?

Q2) Also, I want to share the host pinned memory between two process while MPS is enabled. I see that MPS uses mmap() with MAP_SHARED and also backs the memory with a temporary file.

Output of strace:
open(“/dev/shm/cuda.shm.3e8.48b2.173”, O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 32
fstat(32, {st_mode=S_IFREG|0600, st_size=2097152, …}) = 0
lseek(32, 0, SEEK_END) = 2097152
mmap(0x201b800000, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 32, 0) = 0x201b800000

So my question is, what CUDA API allows me to share the host pinned memory between processes while using MPS?

Q1: No, you would need to use CUDA IPC for this.

Q2: There isn’t any CUDA API for this.

So after some googling, I was able to find the answer to Q2. There is a driver API : cuMemHostRegister() which allows to make an already mmap’ed memory to be pinned and registered with the device. So I create a shared memory (using shm_open(), ftruncate() and mmap()) on a main process and then register this memory with device using cuMemHostRegister(). After this, whichever process mmap’s this shared memory segment, it can then access the same memory (and device can also access this memory because it is pinned). Hence this allows to share host pinned memory between multiple processes. (MPS creates a single device address space for all processes. This is stated as a limitation in MPS documentation. It even states that two processes using MPS can clobber each other’s device address space and hence MPS has drawback of not providing memory isolation. I am just using this fact as an advantage instead of a limitation)

Also, I tried and two kernels in two different processes can share device memory using same device pointer when MPS is enabled (atleast on pre-Volta architecture). I didn’t need to use IPC.(Without enabling MPS, if one process tries to access memory allocated by other process, I get “invalid memory” error, which is expected).

So txbob, am I doing something wrong here or are there any limitation on what I am doing (is it that I just got lucky this time but this will not be true in other conditions like having multiple devices etc?)?

For Q1, I don’t believe your method will work on Volta, and on pre-Volta I’m not aware that such functionality is documented or officially supported. In general, I think sharing a bare pointer between two processes without using a documented IPC method is sketchy. You’re welcome to do whatever you wish. It may work (due to pre-volta MPS).

For Q2, I wouldn’t be able to comment on your method.