Loading a new plan file while running inference

cmehrshad · June 8, 2020, 10:21pm

Description

Hello,

My question is: Is it possible to load a new plan file (without using it) to the GPU while continuously running inference?

I’m currently using trtexec sample to execute plan files. Let’s say I have 2 plan files, one for Inception and one for MobileNet. I want to load Inception and do multiple inferences and then in the meantime, load MobileNet plan file to hide the loading latency, so that later, MobileNet is already loaded and can be used right away.

So:

Is it possible to load a new model while another one is busy running inferences (without interrupting it)
If yes, can we hide the latency of loading the new model? Can we load in a way that the latency of the old model is not impacted?

Thanks,

Environment

TensorRT Version : 7.1
GPU Type : 512-Core Volta GPU with Tensor Cores
Nvidia Driver Version :
CUDA Version : 10.2
CUDNN Version : 8.0
Operating System + Version : Jetpack 4.4
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

SunilJB · June 9, 2020, 4:05am

Hi,

In order to run multiple model with TensorRT, i will recommend you to either use NVIDIA deepstream or NVIDIA Triton Inference Server.
Please refer below link for more details:

If you want to perform multi threading using TensorRT, please refer below link for best practices:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-700/tensorrt-best-practices/index.html#thread-safety

Thanks

Topic		Replies	Views
Triton Inference server : Inference on multi-gpus and load balancing across gpus General	3	261	November 28, 2024
Run multiple model(engine) with tensorrt without deepstream TensorRT	1	1121	April 20, 2020
Loading batches with TensorRT python interface TensorRT	5	504	September 8, 2020
Latency when running TensorRT engine on two GPU TensorRT	9	1233	August 24, 2020
TF-TRT5: How to run tensorflow-tensorrt inferences with multiple GPUs TensorRT	10	3581	September 3, 2019
Multiple model Inference And Runtime Model Switching Isaac ROS ros , isaac-ros-dnn-inference	3	645	May 13, 2024
Not able to inference multiple input models using TRT TensorRT tensorrt , tensorflow , jetson-inference	1	437	August 12, 2021
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	659	October 5, 2021
Slow inference UNet Industrial TF-TRT TensorRT tensorrt , tensorflow	1	458	July 2, 2023
How to inference with tensorrt on multi gpus in python TensorRT	2	2144	April 9, 2021

Loading a new plan file while running inference

Description

Environment

Related topics