CUDA IPC - Virtual Memory API (cuMemImportFromShareableHandle CUDA_ERROR_INVALID_DEVICE ) - CUDA 11.3

wurrego1 · April 23, 2021, 12:12am

Hello,
I am adapting the cuda sample MemMapIPC into 2 very simple producer / consumer applications. The producer works just find (so it seems), however my consumer application returns a “invalid device ordinal” error when I call cuMemImportFromShareableHandle. I am not sure why, as this call does not take a deviceID parameter. My code is almost identical to the example code, and i verified the file descriptor is passed through the IPC correctly, so my shareableHandles vector looks accurate. Any insight or suggestions to further debug as to why cuMemImportFromShareableHandle would return CUDA_ERROR_INVALID_DEVICE = 101 ?

Producer: Creates, Shares, and also accesses the the device mem to put some data in it

cuMemCreate to allocate
cuMemExportToShareableHandle to create the fd
cuMemAddressReserve to reserve virtual address space
cuMemMap to map for access
cuMemSetAccess to set permissions
copies some host data into the memory
goes to sleep for 30 seconds

Consumer: Gets the fd via IPC, maps the memory, and accesses the data

cuMemAddressReserve to reserve virtual address space
cuMemMap to map for us
cuMemImportFromShareableHandle to turn the int fd back into CUDA IPC Handles
cuMemMap to map for access
runs kernel to print some data

Thanks
Bill

wurrego1 · April 23, 2021, 2:42pm

Attached is a complete example demonstrating the issue i am observing.

simple-mmap-prod-con.tar.gz
md5sum: 3a29415ec0696b88d29b48b776d25f2b

simple-mmap-prod-con.tar.gz (12.0 KB)

wurrego1 · April 23, 2021, 10:22pm

After spending some time with the MemMapIPC example, I realize it may not fit my use case.

The file descriptors generated by cuMemExportToShareableHandle into ShareableHandle are local to your process space, and can be viewed under /proc/pid/fd linked to /dev/nvidiactl

They are not, by themselves, shareable across process space bounds. Process B would need knowledge of Process A PID and access to /proc/pid-a/ filesystem

The example uses unix CMSG sendmsg / recvmsg with SCM_RIGHTS to recreate the file descriptors in process B with the system command dup.

Im not sure what magic happens in the device driver to map these back into CUDA Device Memory Handles.

Since my use case involves using host shared memory as as a means to share IPC handles between a producers and many dockerized consumer processes that are unaware of the producer process, it looks like this example is not a good starting point for me.

I have an existing implementation based on cudaIpcGetMemHandle , and it seems i am sticking with that for now… unless there is a way to use those handles with the new API functions (i.e. cuAddressReserve, cuMemMap )…

Wrt to that, are CUmemGenericAllocationHandle the same as cudaIpcGetMemHandle ?

Thanks
Bill

oren.bell · May 19, 2022, 3:22pm

I got the same CUDA_ERROR_INVALID_DEVICE when calling cuMemImportFromShareableHandle from the same process that exported it. (I acknowledge that’s a weird usecase).

Still, you point out that this call doesn’t work across process address spaces, which is a glaring gap in the documentation. But my observation shows the same error even when called from the same thread in the same address space.

The only common thread here is docker. Perhaps that causes driver issues?

2007303105 · May 19, 2023, 2:25am

wurrego1:

After spending some time with the MemMapIPC example, I realize it may not fit my use case.

The file descriptors generated by cuMemExportToShareableHandle into ShareableHandle are local to your process space, and can be viewed under /proc/pid/fd linked to /dev/nvidiactl

They are not, by themselves, shareable across process space bounds. Process B would need knowledge of Process A PID and access to /proc/pid-a/ filesystem

The example uses unix CMSG sendmsg / recvmsg with SCM_RIGHTS to recreate the file descriptors in process B with the system command dup.

Im not sure what magic happens in the device driver to map these back into CUDA Device Memory Handles.

Since my use case involves using host shared memory as as a means to share IPC handles between a producers and many dockerized consumer processes that are unaware of the producer process, it looks like this example is not a good starting point for me.

I have an existing implementation based on cudaIpcGetMemHandle , and it seems i am sticking with that for now… unless there is a way to use those handles with the new API functions (i.e. cuAddressReserve, cuMemMap )…

Wrt to that, are CUmemGenericAllocationHandle the same as cudaIpcGetMemHandle ?

Thanks
Bill

Hello, did this problem solved? If solved, can you share the solution to me, thanks!

byte.xiaobin · July 31, 2025, 10:32am

Same problem. Is there any solution now?

byte.xiaobin · August 4, 2025, 8:21am

Need to use UNIX domain sockets (AF_UNIX) combined with the SCM_RIGHTS mechanism enable the transfer of file descriptors. Please refer to unix(7) - Linux manual page which describe how to send a file descriptor by SCM_RIGHTS.

Commonly, this operation is referred to as "passing a file
descriptor" to another process.  However, more accurately,
what is being passed is a reference to an open file
description (see open(2)), and in the receiving process it
is likely that a different file descriptor number will be
used.  Semantically, this operation is equivalent to
duplicating (dup(2)) a file descriptor into the file
descriptor table of another process.

Share memory just send integer value fd, so it doesn’t work.

Topic		Replies	Views
cuMemImportFromShareableHandle return CUDA_ERROR_INVALID_DEVICE CUDA Programming and Performance cuda	0	26	July 31, 2025
Can we used cuMemExportToShareableHandle/cuMemImportFromShareableHandle to simulate the function of cudaIpcGetMemHandle/cudaIpcOpenMemHandle? Drivers - Linux, Windows, MacOS	2	974	March 17, 2024
CUDA IPC Low Level VMM, cuMemImportFromShareableHandle returns invalid device ordinal CUDA Programming and Performance	1	88	September 5, 2024
'invalid device ordinal' (cudaErrorInvalidDevice) CUDA Programming and Performance	6	5740	August 25, 2015
Why exporting and importing CUDA IPC handles in the scope of the same Linux process is not supported? CUDA Programming and Performance cuda	7	951	May 10, 2023
cuIpcOpenMemHandle return CUDA_ERROR_INVALID_CONTEXT CUDA Programming and Performance	2	1256	July 25, 2022
cuIpcGetMemHandle in 32-bit Application CUDA Programming and Performance	3	658	February 18, 2023
Virtual Memory Management APIs on the DRIVE AGX Orin DRIVE AGX Orin General status_answered , driveos-cuda , c_customer_unaware , a_not_needed , s_np	6	832	May 4, 2023
Memory Leak when using Virtual Memory API (cuMemImportFromShareableHandle) DRIVE AGX Orin General status_wip , driveos-cuda , a_file_nvbug	14	215	May 6, 2025
GPU Inter-Process Communications(IPC) question CUDA Programming and Performance	13	15505	January 4, 2023

CUDA IPC - Virtual Memory API (cuMemImportFromShareableHandle CUDA_ERROR_INVALID_DEVICE ) - CUDA 11.3

Related topics