CUDA IPC replacement for Jetson

Hello!

Currently Jetson does not support CUDA IPC which is necessary for torch.multiprocessing when using cuda tensors. This problem was raised multiple times on this forum (1, 2, 3, 4), however, no clear solutions/fixes/workarounds were posted. After struggling with this problem for some time I found a way around it and I thought I would share it here in case others find it useful.

Working example:
Install torch and cuda-python and run the following script:

it creates a shared memory slot, which can be accessed from different processes.

I also wrote a queue and a buffer that can be used to share cuda tensors across different processes.

Explanation:
First we allocate physical memory using cuMemCreate, then we reserve virtual address space through cuMemAddressReserve, next we map physical memory into virtual address space cuMemMap and set access permissions with cuMemSetAccess. Now we can use the most important function cuMemExportToShareableHandlewhich exports memory allocation to a file descriptor referring to that memory allocation which can be shared between processes. However, to share the file descriptor correctly we need to send it through a unix socket. Next a child process with this socket can receive the file descriptor and export an allocation handle from the shareable handle via cuMemImportFromShareableHandle. Lastly we can use the allocation handle to obtain the pointer by reserving a virtual memory address, mapping it onto allocation and setting access permissions.

I hope this helps whomever needs to share cuda tensors between different processes on Jetson. Furthermore, I am happy to hear any feedback on my solution since, I only started working with CUDA recently.

Best,
Jakub

1 Like

*** Please note that this reply is generated by LLM automatically ***

The question is not a traditional question, but rather a post from a user named Jakub who is sharing a solution to a problem they encountered on the Jetson AGX Orin platform. The problem is related to CUDA IPC (Inter-Process Communication) not being supported on Jetson, which is necessary for using CUDA tensors with PyTorch’s multiprocessing module.

Jakub mentions that this problem has been raised multiple times on the forum, but no clear solutions or workarounds were posted. After struggling with the problem, Jakub found a way around it and is sharing their solution.

The solution involves installing and running a script from GitHub, which creates a shared memory slot that can be accessed from different processes. Jakub also wrote a queue and a buffer that can be used to share CUDA tensors across different processes.

The solution uses the following steps:

  1. Allocate physical memory using cudaMalloc.
  2. Reserve virtual address space through cudaMallocHost.
  3. Map physical memory into virtual address space and set access permissions using cudaHostRegister.
  4. Export the memory allocation to a file descriptor using cudaHostGetDevicePointer.
  5. Share the file descriptor between processes using a Unix socket.
  6. Receive the file descriptor in a child process and export an allocation handle using cudaHostGetDevicePointer.
  7. Use the allocation handle to obtain a pointer by reserving a virtual memory address, mapping it onto the allocation, and setting access permissions.

Jakub hopes that this solution will help others who need to share CUDA tensors between different processes on Jetson. They also welcome feedback on their solution, as they are new to working with CUDA.

In comparison to other solutions, Jakub’s approach is a creative workaround that uses shared memory and Unix sockets to share CUDA tensors between processes. This solution is specific to the Jetson AGX Orin platform and may not be applicable to other platforms.

In terms of the context provided, this solution is related to the CUDA process component, which is a capability required for a CUDA process to run on top of MIG (Multi-Instance GPU). The solution also involves the use of PyTorch, which is a deep learning framework that is widely used on the Jetson platform.

Overall, Jakub’s solution is a valuable contribution to the community, and their willingness to share their knowledge and expertise is appreciated.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Hi,

Thanks for sharing the experience.