We have K40, K20 and a few M2090 GPUs on our server. The strange thing is nvidia-smi always reports 0 Watt power for M2090 GPUs even when they are busy. :O< The same problem also occurs to nvml API (expected since nvidia-smi is derived from nvml). Non-zero power values can be correctly reported for K40 and K20, though. More info:
SL6.3 Linux,
ECC on,
persistence mode on,
CUDA 7.0
Can you show sample output from running nvidia-smi -q for the M2090? My memory is hazy, but if I recall correctly the M2090 does not include the sensors required for real-time power measurements. However, if so, nvidia-smi should show “n/a” instead of “0 W”. Is the behavior consistent across all M2090s? It would be good to exclude the possibility of a defective sensor on a particular card before you proceed to file a bug against nvidia-smi.
Hi njuffa, nice to see you again. I grep-ped the power-related output:
Tesla M2090
Power Readings
Power Management : Supported
Power Draw : 0.00 W
Power Limit : 225.00 W
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Power Management Object : 4.0
Tesla M2090
Power Readings
Power Management : Supported
Power Draw : 0.00 W
Power Limit : 225.00 W
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Power Management Object : N/A
GeForce GTX TITAN
Power Readings
Power Management : Supported
Power Draw : 24.46 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 150.00 W
Max Power Limit : 265.00 W
Power Management Object : 4.0
Tesla M2090
Power Readings
Power Management : Supported
Power Draw : 0.00 W
Power Limit : 225.00 W
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Power Management Object : N/A
SW Power Cap : Not Active
Tesla K40c
Power Readings
Power Management : Supported
Power Draw : 20.58 W
Power Limit : 235.00 W
Default Power Limit : 235.00 W
Enforced Power Limit : 235.00 W
Min Power Limit : 150.00 W
Max Power Limit : 235.00 W
Power Management Object : N/A
SW Power Cap : Not Active
Tesla K20c
Power Readings
Power Management : Supported
Power Draw : 17.24 W
Power Limit : 225.00 W
Default Power Limit : 225.00 W
Enforced Power Limit : 225.00 W
Min Power Limit : 150.00 W
Max Power Limit : 225.00 W
Power Management Object : 4.0
Tesla M2090
Power Readings
Power Management : Supported
Power Draw : 0.00 W
Power Limit : 225.00 W
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Strange that one M2090 shows “power management object” as “4.0” while the others show it as “n/a”. Maybe different VBIOS versions are used? In light of your previous measurements and the correctly displayed power limit, what you are seeing now would appear to be a regression in NVML (and/or nvidia-smi), so I would suggest filing a bug right away using the bug reporting form linked from the CUDA registered developer website.