Hello!
Currently Jetson does not support CUDA IPC which is necessary for torch.multiprocessing when using cuda tensors. This problem was raised multiple times on this forum (1, 2, 3, 4), however, no clear solutions/fixes/workarounds were posted. After struggling with this problem for some time I found a way around it and I thought I would share it here in case others find it useful.
Working example:
Install torch
and cuda-python
and run the following script:
it creates a shared memory slot, which can be accessed from different processes.
I also wrote a queue and a buffer that can be used to share cuda tensors across different processes.
Explanation:
First we allocate physical memory using cuMemCreate
, then we reserve virtual address space through cuMemAddressReserve
, next we map physical memory into virtual address space cuMemMap
and set access permissions with cuMemSetAccess
. Now we can use the most important function cuMemExportToShareableHandle
which exports memory allocation to a file descriptor referring to that memory allocation which can be shared between processes. However, to share the file descriptor correctly we need to send it through a unix socket. Next a child process with this socket can receive the file descriptor and export an allocation handle from the shareable handle via cuMemImportFromShareableHandle
. Lastly we can use the allocation handle to obtain the pointer by reserving a virtual memory address, mapping it onto allocation and setting access permissions.
I hope this helps whomever needs to share cuda tensors between different processes on Jetson. Furthermore, I am happy to hear any feedback on my solution since, I only started working with CUDA recently.
Best,
Jakub