nvmlDeviceGetPowerUsage always returns 0mW for Tesla C2075

Hi all,

I’m trying to write a daemon for monitoring GPU power usage for a Tesla C2075. When using the NVML API, calls to nvmlDeviceGetPowerUsage always report 0 mW (with NVML_SUCCESS as the return value). Calls to other NVML API, such as nvmlDeviceGetTemperature, report non-zero numbers and also return NVML_SUCCESS. The nvidia-smi tool’s output is consistent with my own application’s behavior. What am I doing wrong with my C2075? Do power readings require special motherboard support? Here’s a brief overview of my setup (I’ve got the Tesla C2075 in a desktop-class motherboard):

Motherboard : ASUS M5A97
CPU : AMD FX-8350 8-core processor
GPU : NVIDIA Tesla C2075
OS : Ubuntu Server 14.04

Thanks for your help!

Hello rflyerly,

I will see if I can replicate this issue and will update. It is likely not a motherboard related.

If I may ask what driver version are you using?

nvidia-smi -a

-CCooper

Hi CCooper,

I’m using the CUDA 7 SDK w/ driver version 346.46. FYI, I’m using the GPU for compute and not for display, hence it’s on a headless machine. Here’s the output:

==============NVSMI LOG==============

Timestamp                           : Tue May 12 08:49:15 2015
Driver Version                      : 346.46

Attached GPUs                       : 1
GPU 0000:05:00.0
    Product Name                    : Tesla C2075
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : N/A
    Accounting Mode Buffer Size     : N/A
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0323111026458
    GPU UUID                        : GPU-0f5a98ac-6f37-52b8-0b05-63caa1e81b4e
    Minor Number                    : 0
    VBIOS Version                   : 70.10.46.00.05
    MultiGPU Board                  : No
    Board ID                        : 0x500
    Inforom Version
        Image Version               : N/A
        OEM Object                  : 1.1
        ECC Object                  : 2.0
        Power Management Object     : 4.0
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x05
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x109610DE
        Bus Id                      : 0000:05:00.0
        Sub System Id               : 0x091010DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 2
            Link Width
                Max                 : 16x
                Current             : 4x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : N/A
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 30 %
    Performance State               : P0
    Clocks Throttle Reasons         : N/A
    FB Memory Usage
        Total                       : 6143 MiB
        Used                        : 10 MiB
        Free                        : 6133 MiB
    BAR1 Memory Usage
        Total                       : N/A
        Used                        : N/A
        Free                        : N/A
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : N/A
        Decoder                     : N/A
    Ecc Mode
        Current                     : Disabled
        Pending                     : Disabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 48 C
        GPU Shutdown Temp           : N/A
        GPU Slowdown Temp           : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 0.00 W
        Power Limit                 : 225.00 W
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 573 MHz
        SM                          : 1147 MHz
        Memory                      : 1566 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 573 MHz
        SM                          : 1147 MHz
        Memory                      : 1566 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

Thank You, currently seeing if I can attain a Tesla C2075 to test this out. Will update as soon as I have more information.

We had a set of our guys look at this today. If they are reading thanks again for the help.

What you are seeing was replicated on our end and we have filed a bug for a long term driver fix.

The below driver 340.76 is a work around for the issues. However as one of our employees pointed out if you take this route you would also have to roll back to CUDA 6.5 as CUDA 7.0 is only supported by driver 346.46 or newer.

ftp://download.nvidia.com/XFree86/Linux-x86_64/340.76/

Downgrading to that driver did the trick - thanks for your help!

Awesome and thanks for finding this issue.