TensorRT MultiThread with MultiGPU

Description

I am in the process of developing an application that uses multiple threads to run different TensorRT engines on different GPUs. Currently, I am having an issue with each thread being set to the same GPU even though CUDA_VISIBLE_DEVICES=0,1,2 and I’m executing “cudaSetDevice” in each thread. I wanted to know if there is a nifty way to do this? If possible, I’d like to avoid creating separate executables for each GPU/thread pair.

Environment

TensorRT 7.1.3.4:
Two V100s:
Nvidia Driver Version:
10.2:
8:
RHEL 8:
C++11:

Hi,

The below links might be useful for you.
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html

For multi-threading/streaming, will suggest you to use Deepstream or TRITON

For more details, we recommend you raise the query in Deepstream forum.

or

raise the query in Triton Inference Server Github instance issues section.

Thanks!