I have deploy a model on tensorrt with python Flask, like resnet_as_sevice. And i test it with 1k requests. For each inference, only one image is processed. Therefore, the GPU usage is very low, only 7%~15%。
I would like to know that how to do parallel processing with tensorrt?
Does tensorrt support thread management? In this case, i can use multiple engine to process different request at the same time.
Greetings!
Thanks for your kindly reply!
So, in order to do parallelism, i have to manage dedicated pairs by my own. Processing requests using producer-consumer model.