nvidia-smi reports 0 Watt power even at P0 state

[b]Dear forum,

We have K40, K20 and a few M2090 GPUs on our server. The strange thing is nvidia-smi always reports 0 Watt power for M2090 GPUs even when they are busy. :O< The same problem also occurs to nvml API (expected since nvidia-smi is derived from nvml). Non-zero power values can be correctly reported for K40 and K20, though. More info:

SL6.3 Linux,
ECC on,
persistence mode on,
CUDA 7.0

Any thoughts will be much appreciated! :O)[/b]

Can you show sample output from running nvidia-smi -q for the M2090? My memory is hazy, but if I recall correctly the M2090 does not include the sensors required for real-time power measurements. However, if so, nvidia-smi should show “n/a” instead of “0 W”. Is the behavior consistent across all M2090s? It would be good to exclude the possibility of a defective sensor on a particular card before you proceed to file a bug against nvidia-smi.

Hi njuffa, nice to see you again. I grep-ped the power-related output:

Tesla M2090
    Power Readings
        Power Management            : Supported
        Power Draw                  : 0.00 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
        Power Management Object     : 4.0

Tesla M2090
    Power Readings
        Power Management            : Supported
        Power Draw                  : 0.00 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
        Power Management Object     : N/A

GeForce GTX TITAN
    Power Readings
        Power Management            : Supported
        Power Draw                  : 24.46 W
        Power Limit                 : 250.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 250.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 265.00 W
        Power Management Object     : 4.0

Tesla M2090
    Power Readings
        Power Management            : Supported
        Power Draw                  : 0.00 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
        Power Management Object     : N/A
        SW Power Cap                : Not Active

Tesla K40c
    Power Readings
        Power Management            : Supported
        Power Draw                  : 20.58 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 235.00 W
        Power Management Object     : N/A
        SW Power Cap                : Not Active

Tesla K20c
    Power Readings
        Power Management            : Supported
        Power Draw                  : 17.24 W
        Power Limit                 : 225.00 W
        Default Power Limit         : 225.00 W
        Enforced Power Limit        : 225.00 W
        Min Power Limit             : 150.00 W
        Max Power Limit             : 225.00 W
        Power Management Object     : 4.0

Tesla M2090
    Power Readings
        Power Management            : Supported
        Power Draw                  : 0.00 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A

As you can see, all the M2090s appear to have 0 ``power draw’'. It used to work well — here’s an old question (http://stackoverflow.com/questions/20040426/why-do-nvprof-and-nvidia-smi-report-different-results-on-power?noredirect=1#comment41559090_20040426) i asked on SO where M2090s have non-zero results.

Thank you!

Strange that one M2090 shows “power management object” as “4.0” while the others show it as “n/a”. Maybe different VBIOS versions are used? In light of your previous measurements and the correctly displayed power limit, what you are seeing now would appear to be a regression in NVML (and/or nvidia-smi), so I would suggest filing a bug right away using the bug reporting form linked from the CUDA registered developer website.