I am using CUDA_VISIBLE_DEVICES = 0 but yet the process ends up using GPU 2 instead of GPU 0. So now i have 2 processes on GPU 2 as shown below. Note this seems to work randomly sometimes. Any clues or hints would be appreciated.
I am using CUDA 7.0 and I observed if i make CUDA_VISIBLE_DEVICES only “0”, then I cannot start sample programs, even though #0 is sthere according to nvidia-smi and is runing my X windows.
I am resisting upgrading to 7.5 if there is no certainty that it could solve the problem.
But I have seen posts that say they should correspond one to one. Even if they don’t, shouldn’t they be consistent? otherwise what is the point of CUDA_VISIBLE_DEVICES if it is not honoured? How how can one figure out the mapping between what nvidia-smi reports and CUDA_VISIBLE_DEVICES?
Yes, there should be a one-to-one correspondence or mapping. (assuming you don’t make a system configuration change)
You haven’t provided the sequence of commands that you are issuing or a great many other details, so I was just pointing this out in case you didn’t already know it, and were expecting that a process launched with
CUDA_VISIBLE_DEVICES=“0” ./my_task
would always end up on the device enumerated as zero by nvidia-smi
That is not guaranteed to be the case. But if you launch such a process, and it ends up on device 2 (as reported by nvidia-smi) then future commands of the form:
It’s not random.
Nor is it always guaranteed to be reversed. It is SYSTEM SPECIFIC.
In a given system, if you don’t make any configuration changes (changing the motherboard, changing the BIOS, changing slots that cards are installed in, changing the OS, adding other PCIE devices, etc.) then there will be a fixed mapping from CUDA device enumeration to nvidia-smi device enumeration. But this is not guaranteed to be the mapping:
0:0
1:1
2:2
It might be:
2:0
1:1
0:2
It might also be:
1:0
2:1
0:2
Or any other arrangement, that involves a 1:1 mapping.