Multithread inference

hkada · June 7, 2021, 3:14pm

I implemented Object detector class which contains runtime cotext.
It runs on a single thread well(e.g. X fps).
But on multithreads (e.g. N thread),
it runs about X/N fps on each thread.
What shoud I check ?

And I had checked best practice
of tensorrt docs, but it is unclear for me.

AastaLLL · June 8, 2021, 4:54am

Hi,

Please noted that it’s required to use separate CUDA stream for parallel inference.
You can find an example in our trtexec binary directly:

/usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --streams=4

More, please also pay attention to the GPU workload of your detection model.
If it already occupies 99% utilization, multithreads need to wait in turn for the resources.

Thanks.

hkada · June 8, 2021, 5:14am

Thanks for your reply.
Do I need to anything to context execute ?

[edit]
I generated my model using your command.
But the performance doesn’t change.
When I tried, I seemed to feel loading model time on others thread are shorter than first thread. Trt is used cache ?

AastaLLL · June 22, 2021, 3:44am

Hi,

Sorry for the late update.
Could you share the detailed performance you observed with us?

For example, we got 8.61235ms for stream=1, and 8.74668ms for stream=4 with ResNet50.onnx.

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --streams=1
$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --streams=4

This indicates we can inference 4x input concurrently, and the elapsed time is roughly the same.

Thanks.

system · August 29, 2021, 5:31am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1176	May 11, 2021
Tensorrt inference with multi thread and multi stream Jetson AGX Xavier tensorrt	4	859	July 25, 2022
TensorRT Parallel Inference /concurrent inferecing TensorRT tensorrt	10	4039	October 13, 2022
how to run trt in multithreading？ Jetson TX2	15	7951	October 18, 2021
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1172	August 7, 2023
Optimal Trt inference using threads/processes for peoplenet model for Triton Inference Server - archived tensorrt , inference-server-triton , a100	1	1151	July 30, 2021
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1382	September 6, 2024
Inference Time When Using Multi Stream in TensorRT is Much Slower than a Single One TensorRT tensorrt	5	2467	March 30, 2023
Running TensorRT inference in a Python Thread TensorRT	1	517	December 30, 2022
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1340	February 20, 2025

Multithread inference

Related topics