Invoking Tensorrt Model on Jetson Xavier with threads performs slower than invoking in serial manner

nitikesh.bhad · June 24, 2020, 5:50am

I am using Tensorrt C++ api and I kept separate runtime context and cuda stream for running models parallelly with threads on Jetson Xavier. But the performance is actually slower than what I achieved with serial execution. Invoking 8 mobilenetv2 models with thread took on an average 160ms while in serial took 110ms. I think with threads the model is getting invoked concurrently but no streams are running in parallel. I also tried flags from blog https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/ but the results are similar.

Nvvp profiler results are as follows.
Serial Invocation of 8 models.

Threading invocation with 8 threads.

AastaLLL · June 24, 2020, 7:47am

Hi,

Would you mind to check the GPU resource required by one model first?
The GPU utilization can be found in tegrastats.

$ sudo tegrastats

Please noticed that XavierNX has limited GPU resource.
If all the models require more than 99% GPU resource, they need to wait for the resource in turn.

Thanks.

Topic		Replies	Views
Multithread inference Jetson Xavier NX tensorrt	4	901	August 29, 2021
[TensorRT] Speed of concurrent execute multiple TensorRT model on one GPU TensorRT tensorrt	1	1838	May 24, 2020
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1244	May 11, 2021
Multi tensorrt task issue Jetson AGX Xavier tensorrt	3	370	October 18, 2021
Multi-process running tensorRT Jetson AGX Xavier tensorrt	5	1641	October 18, 2021
how to run trt in multithreading？ Jetson TX2	15	8129	October 18, 2021
run multiple models at one time on xavier. Jetson AGX Xavier	3	2052	October 18, 2021
How to inference with tensorrt on multi gpus in python TensorRT	2	2240	April 9, 2021
Issue in making streams concurrent Jetson AGX Xavier	6	964	April 11, 2019
How to make full use of GPU? TensorRT	2	1269	September 3, 2018

Invoking Tensorrt Model on Jetson Xavier with threads performs slower than invoking in serial manner

Related topics