According to my understanding of NVIDIA Deep Learning TensorRT Documentation , it should be possible to build tensorRT engines concurrently from multiple threads. However, I did not get the expected resultson the above platform.
In my C++ program, which started multiple threads and initialized the Tensorrt context and cuda stream in each thread, I found in my testing that the time it took to start two threaded models to process a frame (by processing time I mean one of the two models processing a frame) was greater than the time it took to start only one threaded model(Thread models interact with each other,the more threads, the slower the model.).
Deepstream and triton-server are not suitable for my business, so I need to use TensorRT API for integration. I hope you can help me solve this problem.
Environment
TensorRT Version: TensoRT-8.2.5 GPU Type: RTX3080 Nvidia Driver Version: 520.56.06 CUDA Version: cuda11.8 CUDNN Version: 8.6.0 Operating System + Version: centos7.9 Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
This is the simplest test program that contains the full C++ code and model . Tensort_thead_test1.tar.gz (71.9 MB)
And my onnx model, if you need it yolov5m.tar.gz (67.5 MB)
Steps To Reproduce
Decompress Tensort_thead_test1.tar.gz and open the file
Open CMakeLists.txt to modify tensort and cuda versions.
Run ./build.sh
Open the build directory and run ./test + (number of model threads)
For example:
./test 1
./test 2
View the average frame rate printed on the terminal
I have checked a lot of relevant topics, and you all reply like this, but this is not helpful to me. I have read the relevant documents and confirmed that there is no problem in using them. I hope you can use the examples I provided for debugging.
@spolisetty The example is provided by noblehill above. I tried it, but the problem still remain.Is there any way to solve it? This kind of multithreaded parallelism doesn’t seem performant to me.