nvidia-smi and driver api Matching results from the two

Hi, sorry if this has been explained before,

I have a little program, using the “cu” driver api functions, which get the device count, then enumerates the devices (I have 2 in the system). I’d like to also include the results of temperature obtained from the nvidia-smi command. The output of the nvidia-smi command sees the devices in a different order, then enumerating them with the driver api.

nvidia-smi lists the devices as follows:


==============NVSMI LOG==============

Timestamp : Tue Oct 19 08:55:18 2010

GPU 0:
Product Name : Quadro NVS 295
PCI ID : 6fd10de
Temperature : 64 C
GPU 1:
Product Name : Tesla C2050
PCI ID : 6d110de
Board Serial :
Temperature : 66 C
ECC errors :
Single bit : 0
Double bit : 0
Total : 0
Aggregate single bit: 0
Aggregate double bit: 0
Aggregate total : 0

The program that enumerates the devices lists the devices as follows:


DEVICES FOUND:
Device 0: Tesla C2050: Compute Mode 0
: Bus 4: Device 0
Device 1: Quadro NVS 295: Compute Mode 0
: Bus 3: Device 0

This is in reverse order to the nvidia-smi command. The deviceQueryDrv sample lists in the same order as my program.

The device ID in my program is always 0, I was hoping that I could match on this with the output of the nvidia-smi command. Is this supposed to return a correct Device ID?

Is the order of the nvidia-smi command in the Bus ID number order?

Do you have any other suggestions as to how to match up the nvidia-smi output, to allow me to extract the temperature from this output, and match it to the enumerated output I get using the driver api?

I don’t think device name is good enough, as the next maching will have a few C2050s in it, and then the device names will be the same.

thanks for any input.
Leo

Hi, sorry if this has been explained before,

I have a little program, using the “cu” driver api functions, which get the device count, then enumerates the devices (I have 2 in the system). I’d like to also include the results of temperature obtained from the nvidia-smi command. The output of the nvidia-smi command sees the devices in a different order, then enumerating them with the driver api.

nvidia-smi lists the devices as follows:


==============NVSMI LOG==============

Timestamp : Tue Oct 19 08:55:18 2010

GPU 0:
Product Name : Quadro NVS 295
PCI ID : 6fd10de
Temperature : 64 C
GPU 1:
Product Name : Tesla C2050
PCI ID : 6d110de
Board Serial :
Temperature : 66 C
ECC errors :
Single bit : 0
Double bit : 0
Total : 0
Aggregate single bit: 0
Aggregate double bit: 0
Aggregate total : 0

The program that enumerates the devices lists the devices as follows:


DEVICES FOUND:
Device 0: Tesla C2050: Compute Mode 0
: Bus 4: Device 0
Device 1: Quadro NVS 295: Compute Mode 0
: Bus 3: Device 0

This is in reverse order to the nvidia-smi command. The deviceQueryDrv sample lists in the same order as my program.

The device ID in my program is always 0, I was hoping that I could match on this with the output of the nvidia-smi command. Is this supposed to return a correct Device ID?

Is the order of the nvidia-smi command in the Bus ID number order?

Do you have any other suggestions as to how to match up the nvidia-smi output, to allow me to extract the temperature from this output, and match it to the enumerated output I get using the driver api?

I don’t think device name is good enough, as the next maching will have a few C2050s in it, and then the device names will be the same.

thanks for any input.
Leo

Unfortunately that is how it is. There is no guarantee that the driver enumeration order (which is what nvidia-smi shows) and the CUDA library device enumeration agree. In fact they usually don’t on the multi-gpu linux machines I have. I vaguely recall tmurray hinting that this might get fixed in the future, but for now you are basically stuck, I fear.

Unfortunately that is how it is. There is no guarantee that the driver enumeration order (which is what nvidia-smi shows) and the CUDA library device enumeration agree. In fact they usually don’t on the multi-gpu linux machines I have. I vaguely recall tmurray hinting that this might get fixed in the future, but for now you are basically stuck, I fear.

Thanks for the quick response. I’m using driver version 3.1.

I was hoping that I could match the nvidia-smi output on the device ID returned from the driver api, as opposed to the enumeration order, but the Device ID always returns 0, maybe this is what tmurray was referring to. Maybe there is some insight as to the order that nvidia-smi uses to list the devices. Could it be the Bus ID?

Thanks for the quick response. I’m using driver version 3.1.

I was hoping that I could match the nvidia-smi output on the device ID returned from the driver api, as opposed to the enumeration order, but the Device ID always returns 0, maybe this is what tmurray was referring to. Maybe there is some insight as to the order that nvidia-smi uses to list the devices. Could it be the Bus ID?

It’s not the bus ID. The way nvidia-smi lists the devices is the way the nvidia device driver enumerated them, which is in turn depends on the PCI enumeration order of these devices.

So you’ve got no way to match entries until both the driver api and nvidia-smi return some unique per-gpu info like the serial number or something.

It’s not the bus ID. The way nvidia-smi lists the devices is the way the nvidia device driver enumerated them, which is in turn depends on the PCI enumeration order of these devices.

So you’ve got no way to match entries until both the driver api and nvidia-smi return some unique per-gpu info like the serial number or something.

nvidia-smi reports PCI bus and device IDs as of 256.xx or 260.xx, I forget which. CUDA also reports PCI bus and device IDs since 256.xx.

Hence, problem solved!

nvidia-smi reports PCI bus and device IDs as of 256.xx or 260.xx, I forget which. CUDA also reports PCI bus and device IDs since 256.xx.

Hence, problem solved!

There is the alternative way for resolving this problem You can write analog of nvidia-smi using NVML. Or You can use my analog of nvidia-smi GitHub - smilart/nvidia-cdl.

How work this program you can learn in description of repo.