I want to do C++ multithreading concurrency for infering one engine. But it seems that context or other operations about cuda is blocking , but I am not sure. And the time consumption is almost doubled when increasing thread number double.
To help us debug this issue, can you share a small repro that demonstrates the performance degrade when more threads are launched? I don’t see thread launch/management code in your sample, please include it as well.
I’m getting the following compile errors
root@f35f5b0dbe73:/mnt/test-trt# make
g++ -c main.cpp -g -Wall -std=c++11 -O2 -I./include -I/usr/local/cuda/include -I/usr/local/include/opencv4
main.cpp:9:13: error: 'string' is not a member of 'cv'
std::vector<cv::string> imgsName;
^
main.cpp:9:13: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:9:13: error: 'string' is not a member of 'cv'
std::vector<cv::string> imgsName;
^
main.cpp:9:13: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:9:23: error: template argument 1 is invalid
std::vector<cv::string> imgsName;
^
main.cpp:9:23: error: template argument 2 is invalid
main.cpp:11:85: error: 'string' is not a member of 'cv'
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:85: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:11:85: error: 'string' is not a member of 'cv'
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:85: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:11:95: error: template argument 1 is invalid
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:95: error: template argument 2 is invalid
main.cpp: In function 'void parseImgDir(const string&, std::vector<cv::Mat>&, int&)':
main.cpp:28:42: error: request for member 'push_back' in 'imgsName', which is of non-class type 'int'
imgsName.push_back(imgName);
^
main.cpp: In function 'void thread_task(const char*, int, int)':
main.cpp:54:65: error: invalid types 'int[int]' for array subscript
trtNetPtr->inference(imgs[imgIndex], imgsName[imgIndex], time_preprocess, time_swithContext, time_pureInfer, time_destroy);
^
Makefile:16: recipe for target 'main.o' failed
make: *** [main.o] Error 1
root@f35f5b0dbe73:/mnt/test-trt#
getting following error. I recommend building your application in a docker container to isolate any dependency issues.
root@3fbbd400a988:/mnt/test-trt# make
g++ -c main.cpp -g -Wall -std=c++11 -O2 -I./include -I/usr/local/cuda/include -I/usr/local/include/opencv4
main.cpp:11:85: error: 'string' is not a member of 'cv'
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:85: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:11:85: error: 'string' is not a member of 'cv'
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:85: note: suggested alternatives:
In file included from /usr/include/c++/5/string:39:0,
from /usr/include/c++/5/random:40,
from /usr/include/c++/5/bits/stl_algo.h:66,
from /usr/include/c++/5/algorithm:62,
from ./include/tensorrtNet.hpp:3,
from main.cpp:3:
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
typedef basic_string<char> string;
^
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
/usr/include/c++/5/bits/stringfwd.h:74:33: note: 'std::__cxx11::string'
main.cpp:11:95: error: template argument 1 is invalid
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
main.cpp:11:95: error: template argument 2 is invalid
main.cpp: In function 'void parseImgDir(const string&, std::vector<cv::Mat>&, int&)':
main.cpp:28:42: error: request for member 'push_back' in 'imgsName', which is of non-class type 'int'
imgsName.push_back(imgName);
^
main.cpp: In function 'int main(int, char**)':
main.cpp:76:44: error: invalid initialization of reference of type 'int&' from expression of type 'std::vector<std::__cxx11::basic_string<char> >'
parseImgDir(argv[2], imgs, imgsName);
^
main.cpp:11:6: note: in passing argument 3 of 'void parseImgDir(const string&, std::vector<cv::Mat>&, int&)'
void parseImgDir(const std::string& imgDir, std::vector<cv::Mat>& imgs, std::vector<cv::string>& imgsName){
^
Makefile:16: recipe for target 'main.o' failed
make: *** [main.o] Error 1
root@3fbbd400a988:/mnt/test-trt#
root@92cbe228f45f:/mnt/test-trt# make
g++ -c main.cpp -g -Wall -std=c++11 -O2 -I./include -I/usr/local/cuda/include -I/usr/local/include/opencv4
g++ -c tensorrtNet.cpp -g -Wall -std=c++11 -O2 -I./include -I/usr/local/cuda/include -I/usr/local/include/opencv4
g++ -o main main.o tensorrtNet.o -g -Wall -std=c++11 -O2 -I./include -I/usr/local/cuda/include -I/usr/local/include/opencv4 -L./lib -L/search/odin/gongzhenting/local/gcc-6.1.0/lib64 -L/usr/local/cuda-10.0/targets/x86_64-linux/lib/ -L/usr/local/lib -lcudnn -lcublas -lcudart -lopencv_core -lopencv_highgui -lopencv_imgproc -lopencv_imgcodecs -lpthread -lnvinfer -lnvparsers
root@92cbe228f45f:/mnt/test-trt# sh run.sh
TestData/1_(49).jpg
ERROR: ERROR: ERROR: UFFParser: Invalid UFF file, cannot be opened
UFFParser: Invalid UFF file, cannot be opened
ERROR: UFFParser: Invalid UFF file, cannot be opened
UFFParser: Invalid UFF file, cannot be opened
ERROR: UFFParser: Invalid UFF file, cannot be opened
ERROR: ERROR: tensorrtNet: Fail to parseERROR: ERROR: tensorrtNet: Fail to parse
tensorrtNet: Fail to parseWhoops, Unable to create engine with INPUT_H x INPUT_W:
1376 x 800
tensorrtNet: Fail to parseERROR:
Whoops, Unable to create engine with INPUT_H x INPUT_W: tensorrtNet: Fail to parse
Whoops, Unable to create engine with INPUT_H x INPUT_W: Whoops, Unable to create engine with INPUT_H x INPUT_W: 1376 x 800
1376Whoops, Unable to create engine with INPUT_H x INPUT_W: x 1376 x 800
1376 x 800
800
Segmentation fault (core dumped)
@NVES, the error you get above is caused by the incomplete east.uff file as you may get my repo by only git clone xxx. As I upload the large east.uff file by git-lfs. So you can install git-lfs by sudo yum install git-lfs if your work system is centos, and then use “git lfs clone https://github.com/IvyGongoogle/test-trt.git” to get the my complete repo with correct east.uff.
engine->createExecutionContext(), context->destroy(), cudaStreamCreate(), and cudaStreamDestroy() shoud NOT be run in parallel with context->enqueue() in other threads since they contain blocking cuda API calls (like cudaMemcpy(), mainly for cuDNN/cuBLAS initialization and destruction). After commented out these function calls, got >30% perf improvement for context->enqueue() with 5 and 10 threads.
Attached are the nvprof results after the blocking functions are removed. I can see that a lot of kernels have been running in parallel and there are no blocking cuda API calls. Encourage you to use nvprof if you’d like to have a quick check for any apparent perf issues.
Based on these points, this is not a TRT bug. T5_pure.zip (56.6 MB)