Description
Hello, All. I am use tensorrt to inference the AI module in c++.
But I met a problem with multi-gpu and multi threads.
- first . i build tensorrt module from multi thread (one gpu with one thread).
- seoncd, As we know, tensorrt use multi gpu should call cudaSetDevice in create engine and infer. like.
cudaSetDevice(m_gpuIndex);
But, I found when one thread enter ‘cudaStreamCreate’ or ‘cudaMemcpy’ or ‘enqueueV2(infer context)’ or other cuda methods. AT this time, if other threads enter.
the program will blocking. if I use a mutex to lock before any infer. it will ok. But the performance is bad. Could any one help me?
Environment
TensorRT Version: 8.2.2.1
GPU Type: rtx-3070 (notebook)
Nvidia Driver Version: 470.74
CUDA Version: 11.1
CUDNN Version: 11.1
Operating System + Version: ubuntu 18.04 with linux kernel 5.4.0-99
Python Version (if applicable): no
TensorFlow Version (if applicable): no
PyTorch Version (if applicable): no
Baremetal or Container (if container which image + tag):
Relevant Files
— later …if need.