Description
Parallel inference multi-model with the same input image. I have 2 models to inference in the same time at each frame input. I tried multi-thread, multi-process but I saw when inference on GPU, all model run sequence(not parallel). I also use DLA and GPU to saparate infer but DLA not support all layer → inference time so bad. Please tell me the best way to Parallel inference multi-model with the same input image. Many thanks!!!
Environment
TensorRT Version:
GPU Type:
Nvidia Driver Version: AGX Orin 64GB devkit
CUDA Version: 11.4
Operating System + Version: ubuntu20
Python Version (if applicable): python3.10