According to my understanding of NVIDIA Deep Learning TensorRT Documentation , it should be possible to build tensorRT engines concurrently from multiple threads. However, I did not get the expected resultson the above platform.
In my C++ program, which started multiple threads and initialized the Tensorrt context and cuda stream in each thread, I found in my testing that the time it took to start two threaded models to process a frame (by processing time I mean one of the two models processing a frame) was greater than the time it took to start only one threaded model(Thread models interact with each other,the more threads, the slower the model.).
Deepstream and triton-server are not suitable for my business, so I need to use TensorRT API for integration. I hope you can help me solve this problem.
Environment
TensorRT Version: TensoRT-8.2.5 GPU Type: RTX3080 Nvidia Driver Version: 520.56.06 CUDA Version: cuda11.8 CUDNN Version: 8.6.0 Operating System + Version: centos7.9 Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
This is the simplest test program that contains the full C++ code and model . Tensort_thead_test1.tar.gz (71.9 MB)
And my onnx model, if you need it yolov5m.tar.gz (67.5 MB)
Steps To Reproduce
Decompress Tensort_thead_test1.tar.gz and open the file
Open CMakeLists.txt to modify tensort and cuda versions.
Run ./build.sh
Open the build directory and run ./test + (number of model threads)
For example:
./test 1
./test 2
View the average frame rate printed on the terminal
I have checked a lot of relevant topics, and you all reply like this, but this is not helpful to me. I have read the relevant documents and confirmed that there is no problem in using them. I hope you can use the examples I provided for debugging.
@spolisetty The example is provided by noblehill above. I tried it, but the problem still remain.Is there any way to solve it? This kind of multithreaded parallelism doesn’t seem performant to me.
In multiple threads, if each thread is using tensorrt , the performance will drop dramatically. It is known for me for 2 years. I cannot believe you guys cannot re-produce it!!
It should be issue of thread. No activities too long, thread lost CPU, There is any activity again, it spends much more time to get the CPU from the beginning.