Exclusive compute mode doesn't work with multiple GTX295's & 64-bit Linux

Hi Nvidia Gurus!

I have two GTX295 boards on RedHawk 5.3 Real Time 64-bit Linux (kernel I am using the following script to set up exclusive compute mode. Nvidia-smi is running continuously to act as a client to enforce exclusive compute mode settings so that each host application will hopefully use a different GPU:

Nvidia forum states nvidia-smi must be running continuously in the background for a GPU mode to stay “set”

nvidia-smi -l -i 30 -lsa &

Now actually set the modes to exclusive use by one host thread per GPU…

sudo nvidia-smi -g 0 -c 1
sudo nvidia-smi -g 1 -c 1
sudo nvidia-smi -g 2 -c 1
sudo nvidia-smi -g 3 -c 1

Now list the compute modes we just set…

nvidia-smi -g 0 -s
nvidia-smi -g 1 -s
nvidia-smi -g 2 -s
nvidia-smi -g 3 -s

The GTX295 cards claim they are set for exclusive compute mode according to nvidia-smi. I start two host applications, the first of which enumerates a total of 4 GPUS, and it selects GPU0. The other application starts, enumerates all 4 GPUS, then uses cudaSetDevice to select GPU 2. Both host applications then seem to by trying to use GPU0, as I see it’s temperature spike much higher than the other 3 GPUS, which appear to be idling. What am I doing wrong ? Do I need to use the Cuda driver API instead of Cuda for C ??


Evan Wheeler

Remove the cudaSetDevice.

What is the output of “/sbin/lsof /dev/nvidi* |grep mem” when the two copies of the application are running?

I did remove cudaSetDevice, but it made no difference…

Here’s the output when the two apps are running:

[syssw@dvds1 SdlMultiStream]$ lsof /dev/nvidi* | grep mem

SdlMultiS 16093 syssw mem CHR 195,1 10276 /dev/nvidia1
SdlMultiS 16093 syssw mem CHR 195,3 10275 /dev/nvidia3
SdlMultiS 16141 syssw mem CHR 195,1 10276 /dev/nvidia1
SdlMultiS 16141 syssw mem CHR 195,3 10275 /dev/nvidia3