Multiple different engine inference simoultaneously with TensorRt c++

Multiple different engine inference simoultaneously with TensorRt c++

I wrapped GitHub - cyrusbehr/tensorrt-cpp-api: TensorRT C++ API Tutorial as a Classifier class, and I want to use it with different ONNX models and an engine pool design across different threads. However, I observed that TensorRT works serially with the GPU, not concurrently. Is there a solution for this in TensorRT, or should I use Triton for this purpose?

Environment

TensorRT Version: 8.6
GPU Type: 3060 and 4060
Nvidia Driver Version: 550
CUDA Version: 11.8
CUDNN Version: 8.7
Operating System + Version: ubuntu 18+