Hi Nvidia Gurus!
I have two GTX295 boards on RedHawk 5.3 Real Time 64-bit Linux (kernel 2.6.26.8). I am using the following script to set up exclusive compute mode. Nvidia-smi is running continuously to act as a client to enforce exclusive compute mode settings so that each host application will hopefully use a different GPU:
Nvidia forum states nvidia-smi must be running continuously in the background for a GPU mode to stay “set”
nvidia-smi -l -i 30 -lsa &
Now actually set the modes to exclusive use by one host thread per GPU…
sudo nvidia-smi -g 0 -c 1
sudo nvidia-smi -g 1 -c 1
sudo nvidia-smi -g 2 -c 1
sudo nvidia-smi -g 3 -c 1
Now list the compute modes we just set…
nvidia-smi -g 0 -s
nvidia-smi -g 1 -s
nvidia-smi -g 2 -s
nvidia-smi -g 3 -s
The GTX295 cards claim they are set for exclusive compute mode according to nvidia-smi. I start two host applications, the first of which enumerates a total of 4 GPUS, and it selects GPU0. The other application starts, enumerates all 4 GPUS, then uses cudaSetDevice to select GPU 2. Both host applications then seem to by trying to use GPU0, as I see it’s temperature spike much higher than the other 3 GPUS, which appear to be idling. What am I doing wrong ? Do I need to use the Cuda driver API instead of Cuda for C ??
Thanks,
Evan Wheeler