I am in the process of developing an application that uses multiple threads to run different TensorRT engines on different GPUs. Currently, I am having an issue with each thread being set to the same GPU even though CUDA_VISIBLE_DEVICES=0,1,2 and I’m executing “cudaSetDevice” in each thread. I wanted to know if there is a nifty way to do this? If possible, I’d like to avoid creating separate executables for each GPU/thread pair.
Nvidia Driver Version: