Description
When I run a Yolo3 model it cost about 10 ms.
When I run 2 Yolo3 models in a 2080 GPU in 2 threads with 10000 loop concurrently with multiple streams, it cost about 20 ms for every time.
Yolo3 model GPU usage is about 2G, 2080 has 8 G memory, running batch =1.
HOW can I concurrent execute multiple models in multiple threads with multiple streams, the average cost time be 10 ms every time ???
Environment
TensorRT Version: TensorRT 7 and TensorRT 5
GPU Type: TensorRT 7 for 2080 and TensorRT 5 for Titan XP
Nvidia Driver Version: TensorRT 7 for 10.0 and TensorRT 5 for 9.0
CUDA Version: TensorRT 7 for 10.2 and TensorRT 5 for 9.0
CUDNN Version: TensorRT 7 for 7.6.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): NA
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): NA
Baremetal or Container (if container which image + tag): NA