CUDA_VISIBLE_DEVICES being ignored

mjabri · March 13, 2016, 3:32am

I am using CUDA_VISIBLE_DEVICES = 0 but yet the process ends up using GPU 2 instead of GPU 0. So now i have 2 processes on GPU 2 as shown below. Note this seems to work randomly sometimes. Any clues or hints would be appreciated.

kkang · March 14, 2016, 5:50am

Does this behaviour still happen in the newer driver versions from R352 or R361 driver family?

Furthermore, does “CUDA_VISIBLE_DEVICES” work with CUDA Samples. i.e. ~/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery

Robert_Crovella · March 14, 2016, 11:48am

note that the order of 0,1,2 for CUDA_VISIBLE_DEVICES does not necessarily correspond to the order 0,1,2 for nvidia-smi.

It’s quite possible that the device enumerated as “2” in nvidia-smi is actually the device enumerated as “0” for CUDA_VISIBLE_DEVICES

mjabri · March 14, 2016, 4:18pm

I am using CUDA 7.0 and I observed if i make CUDA_VISIBLE_DEVICES only “0”, then I cannot start sample programs, even though #0 is sthere according to nvidia-smi and is runing my X windows.
I am resisting upgrading to 7.5 if there is no certainty that it could solve the problem.

mjabri · March 14, 2016, 4:21pm

But I have seen posts that say they should correspond one to one. Even if they don’t, shouldn’t they be consistent? otherwise what is the point of CUDA_VISIBLE_DEVICES if it is not honoured? How how can one figure out the mapping between what nvidia-smi reports and CUDA_VISIBLE_DEVICES?

mjabri · March 14, 2016, 4:24pm

I have not tried drivers R352 or R361. I thought the 346.72 is the one recommended for Titan X and the most stable on Ubuntu 14.04.

Robert_Crovella · March 14, 2016, 4:52pm

Yes, there should be a one-to-one correspondence or mapping. (assuming you don’t make a system configuration change)

You haven’t provided the sequence of commands that you are issuing or a great many other details, so I was just pointing this out in case you didn’t already know it, and were expecting that a process launched with

CUDA_VISIBLE_DEVICES=“0” ./my_task

would always end up on the device enumerated as zero by nvidia-smi

That is not guaranteed to be the case. But if you launch such a process, and it ends up on device 2 (as reported by nvidia-smi) then future commands of the form:

CUDA_VISIBLE_DEVICES=“0” ./my_other_task

should also end up on (nvidia-smi) device 2.

mjabri · March 14, 2016, 8:00pm

Right, what I am doing is:

setenv CUDA_VISIBLE_DEVICES “0”

and then launching my program.

I just noticed that if I do the setenv above, the nvidia-smi shows the process running on GPU # 2. Whereas if I do:

setenv CUDA_VISIBLE_DEVICES “2”

and launch the program, it ends up on GPU #0.

If it is reversed, it is easy to work around, but if random (and cannot test for few more hours as my programs are running), that’s a headache…

Robert_Crovella · March 14, 2016, 9:06pm

It’s not random.
Nor is it always guaranteed to be reversed. It is SYSTEM SPECIFIC.

In a given system, if you don’t make any configuration changes (changing the motherboard, changing the BIOS, changing slots that cards are installed in, changing the OS, adding other PCIE devices, etc.) then there will be a fixed mapping from CUDA device enumeration to nvidia-smi device enumeration. But this is not guaranteed to be the mapping:

0:0
1:1
2:2

It might be:

2:0
1:1
0:2

It might also be:

1:0
2:1
0:2

Or any other arrangement, that involves a 1:1 mapping.

mjabri · March 15, 2016, 10:47pm

The mapping is indeed reversed on one machine, and not on the other. Have installed both the same, though they are not identical machines… Thanks!

Topic		Replies	Views
Device Enumeration and cudaSetDevice SDK Examples Failing to Run on Device 0, but run fine on Device CUDA Programming and Performance	5	30676	August 25, 2011
Change id assigned to gpu's CUDA Programming and Performance	2	4518	July 6, 2011
Change Device Order Change default GPU CUDA Programming and Performance	6	8302	October 13, 2011
CUDA capable device ordering CUDA Setup and Installation	2	1080	March 25, 2013
How can I disable a specific device from using CUDA under Ubuntu? CUDA Programming and Performance	4	3088	June 4, 2010
Strange Behavior - Windows vs Linux SetDevice(n) CUDA Programming and Performance	1	817	October 28, 2011
How to change device in NVIDIA CUDA sample programs CUDA Programming and Performance	1	1757	January 10, 2009
Any setting(enivronment variable etc) to not all GPU's in Multi-GPU hardware? CUDA Programming and Performance	2	589	June 29, 2011
no CUDA-capable device is detected CUDA Programming and Performance	3	7661	September 7, 2013
How Can I change device order in multiGPU? CUDA Programming and Performance	3	8017	May 6, 2008

CUDA_VISIBLE_DEVICES being ignored

Related topics