Tensorrt multi gpu with multi threads

Description

Hello, All. I am use tensorrt to inference the AI module in c++.
But I met a problem with multi-gpu and multi threads.

  • first . i build tensorrt module from multi thread (one gpu with one thread).
  • seoncd, As we know, tensorrt use multi gpu should call cudaSetDevice in create engine and infer. like.
cudaSetDevice(m_gpuIndex);

But, I found when one thread enter ‘cudaStreamCreate’ or ‘cudaMemcpy’ or ‘enqueueV2(infer context)’ or other cuda methods. AT this time, if other threads enter.
the program will blocking. if I use a mutex to lock before any infer. it will ok. But the performance is bad. Could any one help me?

Environment

TensorRT Version: 8.2.2.1
GPU Type: rtx-3070 (notebook)
Nvidia Driver Version: 470.74
CUDA Version: 11.1
CUDNN Version: 11.1
Operating System + Version: ubuntu 18.04 with linux kernel 5.4.0-99
Python Version (if applicable): no
TensorFlow Version (if applicable): no
PyTorch Version (if applicable): no
Baremetal or Container (if container which image + tag):

Relevant Files

— later …if need.

Steps To Reproduce


Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#stream-priorities
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!