[TensorRT] Model inferencing speed reduction on Jetson Xavier AGX using 2 models

hananana · October 25, 2021, 11:24pm

Description

Dear forum developers,

I have two models, one is the object detection and the other is segmentation one.
Each of models have different pre-processing because their input shapes are different.

I’ve serialized and quantized into INT8 two models following NVIDIA’s sample code, and ran successfully.
And the inferencing time of INT8 qunatized model was much faster than FP32/16 one.

But the problem is happened when I tried to run those models at the same time.

If the inferencing time of model A is 3 ms and model B is 4 ms in single inferencing mode, the inferencing time are increased to 6 ms(model A) and 10 ms (model B) in simultaneous running mode, respectively.

I don’t know what happend in here.
The pre-processing uses Opencv-CUDA and post-processing uses OpenGL which are using GPUs.
Could it be the problem?
I’ve turned off the pre/post processing then the inferencing time was increased slightly, but still much slower than the single inferencing.

Please give me some advice.

Thanks.

Environment

TensorRT Version: TensorRT 8
GPU Type: NVIDIA Jetson Xavier AGX
Nvidia Driver Version:
CUDA Version: 11.4
CUDNN Version: 8.4
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

nadeemm · October 25, 2021, 11:42pm

I think the Jetson team will be in a better position to help you , so I have moved this topic and added some tags too.

AastaLLL · October 26, 2021, 2:48am

Hi,

Could you double-check the CUDA and cuDNN versions?
For Jetson, CUDA 11.4 is not available yet.

If it occupied all the GPU resources to get model A is 3 ms and model B is 4 ms.
When you deploy them concurrently, they need to share the GPU resources and might also induce some switching overhead.
The result of 6 ms(model A) and 10 ms (model B) seems acceptable.

Thanks.

hananana · October 26, 2021, 2:57am

The CUDA and cuDNN versions are not the correct.
(The Jetpack 4.6 is installed on the device)

and then, Can I know the reason why the inferencing speed slowing down about 2 times?

Thanks.

AastaLLL · November 1, 2021, 3:29am

Hi,

As mentioned above.

When two models are concurrently running, they need to share the GPU resources.
So this will increase the inference time.

Thanks.

system · November 24, 2021, 3:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference time becomes longer when doing non-continuous fp16 or int8 inference TensorRT tensorrt , jetson-inference	33	3244	March 30, 2023
Jetson AGX Xavier INT8 Performance Jetson AGX Xavier	4	1764	October 18, 2021
Slow inference using CUDA and PyTorch on Jetson AGX CUDA Programming and Performance cuda , jetson-inference , pytorch	4	652	November 2, 2021
There is a difference in inference speed in TensorRT 8 TensorRT tensorrt	4	506	October 28, 2021
Inconsistent TensorRT Inference Time on Jetson Xavier NX TensorRT	5	30	March 4, 2025
Inference time on Jetson Xavier compared with local host PC？ Jetson AGX Xavier	8	770	October 18, 2021
Jetson AGX Xavier GPU RAM usage for object detection and instance segmentation inferencing Jetson AGX Xavier tensorrt , jetson-inference , pytorch , onnx	2	887	May 13, 2022
Inference slow using nvInfer and TensorRT directly into PX2 General	6	754	April 17, 2019
TRT inference speed on two AGX Xavier TensorRT	1	305	September 12, 2021
Inferencing on AGX Xavier in INT8 mode Jetson AGX Xavier jetson-inference	3	1061	December 8, 2021

[TensorRT] Model inferencing speed reduction on Jetson Xavier AGX using 2 models

Description

Environment

Relevant Files

Steps To Reproduce

Related topics