How to make full use of GPU?

I have deploy a model on tensorrt with python Flask, like resnet_as_sevice. And i test it with 1k requests. For each inference, only one image is processed. Therefore, the GPU usage is very low, only 7%~15%。

I would like to know that how to do parallel processing with tensorrt?
Does tensorrt support thread management? In this case, i can use multiple engine to process different request at the same time.

Is it possible to use tensorrt with tf-serving?


regarding parallel processing with trt:
please reference the following the follwing GTC talk:
which discusses optimizing OpenNMT and on page 33 & 34, you can see multi-stream execution and parallel kernel execution.

For parallelism, for each stream you need a dedicated pair (CudaEngine, ExecutionContext), then make sure your kernels are not resource bound.

Thanks for your kindly reply!
So, in order to do parallelism, i have to manage dedicated pairs by my own. Processing requests using producer-consumer model.