I want to achieve parallel inference on two Tensor RT engine. Could someone please point to documentation/ sample code?
Eg. Engine 1 takes 30 ms and Engine 2 takes 30 ms. I want to create a multi-threaded pipeline where both threads run simultaneously and execute in 30 ms.
Right now, i have created 2 threads with different execution context. The GPU execution is not happeneing in parallel.
TensorRT Version: 7.0.0-1+cuda10.0
GPU Type: RTX 2060
Nvidia Driver Version:
CUDA Driver Version / Runtime Version 10.2 / 10.0
CUDA Capability Major/Minor version number: 7.5
CUDNN Version: -
Operating System + Version: Ubuntu 18.04.4 LTS
Python Version (if applicable): Python 3.6
Screenshot of visual profiler: