Why exporting and importing CUDA IPC handles in the scope of the same Linux process is not supported?

dreqeu · May 9, 2023, 9:37am

Hi,

I am aware that opening back a CUDA IPC handle within the scope of the same Linux process is not supported. As a result, in the below code excerpt the call to cudaIpcOpenMemHandle fails before the assert is even reached(error checks have been removed to reduce verbosity):

void* mem_ptr;
cudaMalloc(&mem_ptr, MEM_SIZE);

// Get memory handle
cudaIpcMemHandle_t ipc_handle;
cudaIpcGetMemHandle(&ipc_handle, mem_ptr);

// Use the memory handle to access the allocated memory
void* mem_ptr_2;
cudaIpcOpenMemHandle(&mem_ptr_2, ipc_handle, cudaIpcMemLazyEnablePeerAccess);

//
assert(mem_ptr == mem_ptr_2)

It’d be interesting to understand why such limitation exists in CUDA IPC, and whether there are ways to work around it?

MarkusHoHo · May 9, 2023, 1:45pm

Hi @dreqeu, welcome back.

I propose we move this post to the CUDA programming category, I am sure there will be someone able to answer this.

Thanks!

Robert_Crovella · May 9, 2023, 1:53pm

since the original allocation is already available in the allocating process, there is no need to use IPC to connect to it, and indeed it is not an example of Inter-Process communication. To “work around” this, just use the allocated pointer directly in the allocating process.

dreqeu · May 9, 2023, 2:33pm

Agreed!

dreqeu · May 9, 2023, 2:50pm

I agree @Robert_Crovella, at first glance importing the IPC handler back into the address space of the exporting OS process may seem gratuitous. On the other hand, I see no logical reason for this pattern to not work in theory, but I bet in practice there is a motivation behind it. Maybe you can shed some light on the practical side?

Robert_Crovella · May 9, 2023, 2:55pm

You mean shed some light on what is going on under the hood? No, I don’t have any comments on that. From a programmers perspective, it works the way it works, and it is certainly possible to access an IPC-oriented allocation from the allocating process. You just don’t use the IPC handle to do it, you use the original allocated pointer.

Robert_Crovella · May 9, 2023, 11:06pm

I haven’t studied it carefully, but cuMemExportToShareableHandle() may also be something to check out. I still don’t think it gives you the symmetry you are looking for, but I haven’t tried exporting and importing a handle to the same process.

dreqeu · May 10, 2023, 8:35am

Thanks for pointing out the low-level virtual management API Robert.

The main motivation behind the apparent gratuitous pattern is I’d like to wrap device pointers in shared memory friendly shared pointers - ShmSharedPtr - to manage the lifetime of the device memory automatically across the boundaries of multiple Os processes. Once the raw device pointer is wrapped in ShmSharedPtr, the resulting object is to be passed around within the scope of the OS process that allocated the device memory and created the ShmSharedPtr instance as well as in other sibling OS processes. Ideally, ShmSharedPtr should internally hold the IPC handle only and call cudaIpcOpenMemHandle to get hold of the actual device pointer whenever the ShmSharedPtr instance is dereferenced.

Now, ShmSharedPtr could track the PID that allocated the device memory and use either the raw pointer or the IPC handle depending on the PID dereferencing the ShmSharedPtr instance, but that’s an extra layer of complexity that adds to an already complicated design and was looking for ways to simplify it.

Topic		Replies	Views
cudaIpcGetMemHandle with mapped/pinned memory CUDA Programming and Performance	9	4488	April 14, 2025
Access unified memory from a different process CUDA Programming and Performance	5	547	August 31, 2022
How to share CUDA memory between two processes? CUDA Programming and Performance	3	2925	July 9, 2018
Why do SimpleIPC example use sharedMemoryCreate before cudaMalloc? CUDA Programming and Performance cuda	2	1266	July 19, 2022
CUDA IPC - Virtual Memory API (cuMemImportFromShareableHandle CUDA_ERROR_INVALID_DEVICE ) - CUDA 11.3 CUDA Programming and Performance	4	2069	May 19, 2023
How to access gpu memory between processes CUDA Programming and Performance	10	2598	August 4, 2023
'invalid device ordinal' (cudaErrorInvalidDevice) CUDA Programming and Performance	6	5598	August 25, 2015
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4024	March 7, 2012
Access Unified Memory location from two different application CUDA Programming and Performance	4	503	January 2, 2023
GPU Inter-Process Communications(IPC) question CUDA Programming and Performance	13	15266	January 4, 2023

Why exporting and importing CUDA IPC handles in the scope of the same Linux process is not supported?

Related topics