Multiprocessing PyTorch inference with TensorRT on Jetson Orin NX devices

Hi there,

I recently came across this issue relating to torch.multiprocessing on Jetson devices. The last answer 18 months ago advised that torch.multiprocessing could not be used on Jetson devices due to:

  • Jetson devices using NvSCI IPC for memory sharing
  • PyTorch uses CUDA IPC for memory sharing and doesn’t support NvSCI

I’m hoping to implement multiprocessing of a PyTorch model on a Jetson Orin NX device - does a solution to this exist now?

I’ve done some tests running PyTorch’s Hogwild multiprocessing example, and received the following error:

$ python3 main.py --cuda
Traceback (most recent call last):
  File "main.py", line 96, in <module>
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 261, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
  File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 920, in _share_cuda_
    return self._untyped_storage._share_cuda_(*args, **kwargs)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Some information about the system I’m using:

# System
L4T: 35.3.1
Jetpack: 5.1.1
CUDA: 11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2

# Python
python=3.8.10
torch=2.0.0+nv23.5
torch2trt=0.4.0
torchvision=0.15.1a0+42759b1

I’ve also tried downgrading to TensorRT 0.2.0 as suggested in this post but got the same error.

Appreciate any help or suggestions.

Thanks!
Andrew

Hi,

Unfortunately, no.
This feature will need PyTorch to implement multiprocessing with NvSCI API.
So it’s recommended to create a feature request on their repo/forum.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.