[0] server79:0[gpu:0] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-0
GPU-469a9254-9508-3bb6-e525-e9175fb1074f
[1] server79:0[gpu:1] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-1
GPU-e9fd56be-35ff-0730-8df2-29ecdddd4a76
[2] server79:0[gpu:2] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-2
GPU-b2f7e1c0-f402-adc5-e55e-b3a9c474c4f1
[3] server79:0[gpu:3] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-3
GPU-97abd657-3537-7891-7254-ec9a80fd67ea
[4] server79:0[gpu:4] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-4
GPU-0129a237-c1f9-b189-7678-52dfc928f093
[5] server79:0[gpu:5] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-5
GPU-176e0ad3-19a0-3d40-5fc1-a1ccfbb647c2
[6] server79:0[gpu:6] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-6
GPU-901bbfde-8e5d-4f33-563f-8bfafa738e60
[7] server79:0[gpu:7] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-7
GPU-a2c6fb9e-43d6-af2f-16f0-041e610246d8
[8] server79:0[gpu:8] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-8
GPU-ceea231c-4257-7af7-6726-efcb8fc2ace9
[9] server79:0[gpu:9] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-9
GPU-74a33432-784e-49ad-f20d-168db61b05b8
[10] server79:0[gpu:10] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-10
GPU-6f9b6acf-051b-f4e7-c773-a1699518bb24
[11] server79:0[gpu:11] (NVIDIA GeForce RTX 3060)
Has the following names:
GPU-11
GPU-6aa0af9e-a2be-88c8-d2b3-2240d25318d7
NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6
This means, that monitoring and applying settings is impossible without some mapping of actual devices IDs.
Because applying any kind of setting to gpu0 you would apply it to device 8 (in nvidia-smi) .
Is this something new or it was always like this?
Is this normal?
Is there a setting that would match the ID’s in both commands?
I have tried on multiple setups with 2 and 4 GPUs respectively and observed nvidia-smi id’s and UUID are same as nvidia-settings output.
Can you please attach nvidia bug report and will see if it gives any clue.
I have the same issue on 3 other platforms with 12 gpus, but they are with xeons.
I also have 1 other with 10 A4000 and there everything works as expected. They all use the same ancient ubuntu 18 image.
Unfortunately, the order of GPU enumeration is somewhat arbitrary. Or, at least, the order is not guaranteed to be consistent across these different APIs.
With nvidia-settings, the order of GPUs is influenced by which GPUs are used by which X11 X screens, which in turn can be influenced by which GPU the system bios chose to POST. nvidia-smi by design is not influenced by X11. I think nvidia-smi’s GPU ordering would default to lowest-to-highest PCI Bus ID.
So, for better or worse, this is expected today. Changing it now would likely break others’ workflows.
The GPU UUID is intended to be the way to correlate GPUs across these different APIs. It is admittedly not the easiest to script, but I think you would need to use the GPU UUID to map between a GPU in nvidia-smi and a GPU in nvidia-settings.
Yes, already done the mapping in script, but it would be easier if nvidia-settings allows UUID as argument as well as id to target the desired GPU.
Then there will be no mapping required. Probably just a warning saying that IDs in nvidia-smi and nvidia-settings are not the same and UUIDs should be used for consistency.