Tensorrt Threads affect each other during multithreaded inference

noblehill · June 25, 2023, 8:54am

Description

According to my understanding of NVIDIA Deep Learning TensorRT Documentation , it should be possible to build tensorRT engines concurrently from multiple threads. However, I did not get the expected resultson the above platform.
In my C++ program, which started multiple threads and initialized the Tensorrt context and cuda stream in each thread, I found in my testing that the time it took to start two threaded models to process a frame (by processing time I mean one of the two models processing a frame) was greater than the time it took to start only one threaded model(Thread models interact with each other,the more threads, the slower the model.).
Deepstream and triton-server are not suitable for my business, so I need to use TensorRT API for integration. I hope you can help me solve this problem.

Environment

TensorRT Version: TensoRT-8.2.5
GPU Type: RTX3080
Nvidia Driver Version: 520.56.06
CUDA Version: cuda11.8
CUDNN Version: 8.6.0
Operating System + Version: centos7.9
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

This is the simplest test program that contains the full C++ code and model .
Tensort_thead_test1.tar.gz (71.9 MB)
And my onnx model, if you need it
yolov5m.tar.gz (67.5 MB)

Steps To Reproduce

Decompress Tensort_thead_test1.tar.gz and open the file
Open CMakeLists.txt to modify tensort and cuda versions.
Run ./build.sh
Open the build directory and run ./test + (number of model threads)
For example:
./test 1
./test 2
View the average frame rate printed on the terminal

AakankshaS · June 26, 2023, 5:07am

Hi,

The below links might be useful for you.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!

noblehill · June 26, 2023, 5:13am

I have checked a lot of relevant topics, and you all reply like this, but this is not helpful to me. I have read the relevant documents and confirmed that there is no problem in using them. I hope you can use the examples I provided for debugging.

noblehill · June 26, 2023, 5:14am

Deepstream or TRITON didn’t fit our business, so we built on Tensort’s api.

spolisetty · June 26, 2023, 12:36pm

Hi,

Could you please try on the latest TensorRT version 8.6 and let us know if you still face the same issue.

Thank you.

noblehill · June 27, 2023, 5:11am

I tried TensorRT 8.6.1 and cudn8.9.0 still had the same problem.

noblehill · June 29, 2023, 1:34am

hello, you haven’t replied to me for a few days, can you give me any useful help?

jcy.152 · July 4, 2023, 5:22am

I had the same problem. Is there any way to solve it? Where do you think the code call error?

spolisetty · July 4, 2023, 9:16am

Hi,

Sorry for the delay. Please allow us sometime to try reproducing this issue.

Thank you.

jcy.152 · July 14, 2023, 1:54am

Excuse me, have you checked the result?

jcy.152 · August 15, 2023, 1:44am

Excuse me, have you checked the result?

spolisetty · September 28, 2023, 9:59am

Hi @jcy.152 ,

Apologies for the delayed response.
We are unable to run the issue repro successfully.
Could you please share the minimal working issue repro.

Thank you.

jcy.152 · October 31, 2023, 2:44am

@spolisetty The example is provided by noblehill above. I tried it, but the problem still remain.Is there any way to solve it? This kind of multithreaded parallelism doesn’t seem performant to me.

chenscottusa · March 8, 2024, 9:51pm

In multiple threads, if each thread is using tensorrt , the performance will drop dramatically. It is known for me for 2 years. I cannot believe you guys cannot re-produce it!!

chenscottusa · March 15, 2024, 12:18am

It should be issue of thread. No activities too long, thread lost CPU, There is any activity again, it spends much more time to get the CPU from the beginning.

jcy.152 · April 15, 2024, 5:38am

Do you currently use multithreading for reasoning? How do you handle it?

user74380 · September 6, 2024, 10:08am

@spolisetty how to handle this problem?

Topic		Replies	Views
Speeding up multi-threaded C++ program of TensorRT models TensorRT tensorrt	7	1318	February 20, 2025
Multithread does not improve inference performance with tensorrt models TensorRT tensorrt	2	1175	May 11, 2021
Is TensorRT safe to create engine & context in one thread, and execute in another thread? TensorRT	1	688	June 5, 2022
TensorRT MultiThread with MultiGPU TensorRT	1	479	February 14, 2023
Tensorrt multi gpu with multi threads TensorRT	1	1085	February 18, 2022
Multiple threads execution with different engines in tensorrt TensorRT tensorrt	3	2468	December 13, 2022
TensorRT Builder timing cache - preventing inaccurate timings due to concurrent GPU use TensorRT tensorrt	3	1118	October 16, 2021
how to run trt in multithreading？ Jetson TX2	15	7947	October 18, 2021
Parallel execution of several trt contexts on one GPU TensorRT onnx	1	1153	August 7, 2023
TensorRT multi stream TensorRT	3	2665	February 29, 2024

Tensorrt Threads affect each other during multithreaded inference

Description

Environment

Relevant Files

Steps To Reproduce

Related topics