Multiprocessing PyTorch inference with TensorRT on Jetson Orin NX devices

andrew.stringfield · April 17, 2024, 5:33am

Hi there,

I recently came across this issue relating to torch.multiprocessing on Jetson devices. The last answer 18 months ago advised that torch.multiprocessing could not be used on Jetson devices due to:

Jetson devices using NvSCI IPC for memory sharing
PyTorch uses CUDA IPC for memory sharing and doesn’t support NvSCI

I’m hoping to implement multiprocessing of a PyTorch model on a Jetson Orin NX device - does a solution to this exist now?

I’ve done some tests running PyTorch’s Hogwild multiprocessing example, and received the following error:

$ python3 main.py --cuda
Traceback (most recent call last):
  File "main.py", line 96, in <module>
    p.start()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 261, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
  File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 920, in _share_cuda_
    return self._untyped_storage._share_cuda_(*args, **kwargs)
RuntimeError: CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Some information about the system I’m using:

# System
L4T: 35.3.1
Jetpack: 5.1.1
CUDA: 11.4.315
cuDNN: 8.6.0.166
TensorRT: 8.5.2.2

# Python
python=3.8.10
torch=2.0.0+nv23.5
torch2trt=0.4.0
torchvision=0.15.1a0+42759b1

I’ve also tried downgrading to TensorRT 0.2.0 as suggested in this post but got the same error.

Appreciate any help or suggestions.

Thanks!
Andrew

AastaLLL · April 17, 2024, 6:57am

Hi,

Unfortunately, no.
This feature will need PyTorch to implement multiprocessing with NvSCI API.
So it’s recommended to create a feature request on their repo/forum.

Thanks.

Topic		Replies	Views
'operation not supported' of spawn method in pytorch multiprocessing on Jetson Xavier NX Jetson Xavier NX pytorch	6	1081	November 1, 2022
Multiprocessing on Jetson Jetson TX2 jetson-inference	11	4466	October 4, 2021
CUDA IPC replacement for Jetson Jetson AGX Orin cuda , jetson	2	180	August 4, 2025
Problem with using mp.Queue with CUDA Tensors on AGX ORIN Jetson AGX Orin cuda , pytorch	1	864	December 20, 2023
Tensorrt inference with pytorch tensor(data_ptr) TensorRT tensorrt , cuda , pytorch	2	2017	June 11, 2021
Pytorch on gpu Jetson Orin NX pytorch	3	106	February 3, 2026
Any PyTorch versions supporting torch.distributed and nccl backend on jetson orin nano? Jetson Orin Nano pytorch	17	870	February 27, 2025
Decoupling loading from inference models on multiprocessing Jetson TX1 jetson-inference	2	1154	January 24, 2022
Error in cuda when trying to inference via multiprocessing TensorRT	2	1785	November 14, 2021
TensorRT model deployement on Triton Inference Jetson Xavier NX tensorrt , cuda , jetson-inference , python , inference-server-triton	1	671	May 5, 2023

Multiprocessing PyTorch inference with TensorRT on Jetson Orin NX devices

Related topics