How to share CUDA memory between two processes?

sakjain92 · July 9, 2018, 2:27am

Q1) From what I understand, prior to Voltas architecture, if MPS is enabled, all process will share the device address space. So if one process allocates device memory and passes this address to another process via a pipe, then the kernel launched by second process should be able to access this device memory. Is this correct?

Q2) Also, I want to share the host pinned memory between two process while MPS is enabled. I see that MPS uses mmap() with MAP_SHARED and also backs the memory with a temporary file.

Output of strace:
open(“/dev/shm/cuda.shm.3e8.48b2.173”, O_RDWR|O_NOFOLLOW|O_CLOEXEC) = 32
fstat(32, {st_mode=S_IFREG|0600, st_size=2097152, …}) = 0
lseek(32, 0, SEEK_END) = 2097152
mmap(0x201b800000, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 32, 0) = 0x201b800000

So my question is, what CUDA API allows me to share the host pinned memory between processes while using MPS?

Robert_Crovella · July 9, 2018, 1:41pm

Q1: No, you would need to use CUDA IPC for this.

Q2: There isn’t any CUDA API for this.

sakjain92 · July 9, 2018, 4:33pm

So after some googling, I was able to find the answer to Q2. There is a driver API : cuMemHostRegister() which allows to make an already mmap’ed memory to be pinned and registered with the device. So I create a shared memory (using shm_open(), ftruncate() and mmap()) on a main process and then register this memory with device using cuMemHostRegister(). After this, whichever process mmap’s this shared memory segment, it can then access the same memory (and device can also access this memory because it is pinned). Hence this allows to share host pinned memory between multiple processes. (MPS creates a single device address space for all processes. This is stated as a limitation in MPS documentation. It even states that two processes using MPS can clobber each other’s device address space and hence MPS has drawback of not providing memory isolation. I am just using this fact as an advantage instead of a limitation)

Also, I tried and two kernels in two different processes can share device memory using same device pointer when MPS is enabled (atleast on pre-Volta architecture). I didn’t need to use IPC.(Without enabling MPS, if one process tries to access memory allocated by other process, I get “invalid memory” error, which is expected).

So txbob, am I doing something wrong here or are there any limitation on what I am doing (is it that I just got lucky this time but this will not be true in other conditions like having multiple devices etc?)?

Robert_Crovella · July 9, 2018, 4:53pm

For Q1, I don’t believe your method will work on Volta, and on pre-Volta I’m not aware that such functionality is documented or officially supported. In general, I think sharing a bare pointer between two processes without using a documented IPC method is sketchy. You’re welcome to do whatever you wish. It may work (due to pre-volta MPS).

For Q2, I wouldn’t be able to comment on your method.

Topic		Replies	Views
Launch multiple kernels while using milti-process service (MPS) CUDA Programming and Performance	6	1309	December 12, 2014
cudaIpcGetMemHandle with mapped/pinned memory CUDA Programming and Performance	9	4583	April 14, 2025
IPC Mechanisims for MPS/CUDA CUDA Programming and Performance cuda , api , inception	1	1098	January 5, 2022
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4035	March 7, 2012
CUDA device pointer host-side processes sharing implementation CUDA Programming and Performance	0	670	June 7, 2016
How to access gpu memory between processes CUDA Programming and Performance	10	2826	August 4, 2023
Is it possible for a unified virtual address (UVA) to be shared by difference processes or difference gpus? CUDA Programming and Performance	4	947	May 19, 2022
Why exporting and importing CUDA IPC handles in the scope of the same Linux process is not supported? CUDA Programming and Performance cuda	7	849	May 10, 2023
Interprocessor sharing of device memory Legacy PGI Compilers	2	2908	December 3, 2015
Question about CUDA MPS CUDA Programming and Performance	15	2874	August 22, 2022

How to share CUDA memory between two processes?

Related topics