Multi-model inference paralle on jetson agx orin

Description

Parallel inference multi-model with the same input image. I have 2 models to inference in the same time at each frame input. I tried multi-thread, multi-process but I saw when inference on GPU, all model run sequence(not parallel). I also use DLA and GPU to saparate infer but DLA not support all layer → inference time so bad. Please tell me the best way to Parallel inference multi-model with the same input image. Many thanks!!!

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version: AGX Orin 64GB devkit
CUDA Version: 11.4
Operating System + Version: ubuntu20
Python Version (if applicable): python3.10

I would also like to see how this can be done.

TensorRT is planning to add multi GPU, parallel inference in the 2025 Q2 product roadmap. we will first focus on the datacenter GPU first and edge platform like Jetson will come later.