Hello,
I am adapting the cuda sample MemMapIPC into 2 very simple producer / consumer applications. The producer works just find (so it seems), however my consumer application returns a “invalid device ordinal” error when I call cuMemImportFromShareableHandle. I am not sure why, as this call does not take a deviceID parameter. My code is almost identical to the example code, and i verified the file descriptor is passed through the IPC correctly, so my shareableHandles vector looks accurate. Any insight or suggestions to further debug as to why cuMemImportFromShareableHandle would return CUDA_ERROR_INVALID_DEVICE = 101 ?
Producer: Creates, Shares, and also accesses the the device mem to put some data in it
cuMemCreate to allocate
cuMemExportToShareableHandle to create the fd
cuMemAddressReserve to reserve virtual address space
cuMemMap to map for access
cuMemSetAccess to set permissions
copies some host data into the memory
goes to sleep for 30 seconds
Consumer: Gets the fd via IPC, maps the memory, and accesses the data
cuMemAddressReserve to reserve virtual address space
cuMemMap to map for us
cuMemImportFromShareableHandle to turn the int fd back into CUDA IPC Handles
After spending some time with the MemMapIPC example, I realize it may not fit my use case.
The file descriptors generated by cuMemExportToShareableHandle into ShareableHandle are local to your process space, and can be viewed under /proc/pid/fd linked to /dev/nvidiactl
They are not, by themselves, shareable across process space bounds. Process B would need knowledge of Process A PID and access to /proc/pid-a/ filesystem
The example uses unix CMSG sendmsg / recvmsg with SCM_RIGHTS to recreate the file descriptors in process B with the system command dup.
Im not sure what magic happens in the device driver to map these back into CUDA Device Memory Handles.
Since my use case involves using host shared memory as as a means to share IPC handles between a producers and many dockerized consumer processes that are unaware of the producer process, it looks like this example is not a good starting point for me.
I have an existing implementation based on cudaIpcGetMemHandle , and it seems i am sticking with that for now… unless there is a way to use those handles with the new API functions (i.e. cuAddressReserve, cuMemMap )…
Wrt to that, are CUmemGenericAllocationHandle the same as cudaIpcGetMemHandle ?
I got the same CUDA_ERROR_INVALID_DEVICE when calling cuMemImportFromShareableHandle from the same process that exported it. (I acknowledge that’s a weird usecase).
Still, you point out that this call doesn’t work across process address spaces, which is a glaring gap in the documentation. But my observation shows the same error even when called from the same thread in the same address space.
The only common thread here is docker. Perhaps that causes driver issues?