Description
I am in the process of developing an application that uses multiple threads to run different TensorRT engines on different GPUs. Currently, I am having an issue with each thread being set to the same GPU even though CUDA_VISIBLE_DEVICES=0,1,2 and I’m executing “cudaSetDevice” in each thread. I wanted to know if there is a nifty way to do this? If possible, I’d like to avoid creating separate executables for each GPU/thread pair.
Environment
TensorRT 7.1.3.4:
Two V100s:
Nvidia Driver Version:
10.2:
8:
RHEL 8:
C++11: