K20X SW Power-Cap engaging prematurely

This is a follow up to my previous thread.
I have been having issues with the K20X maintaining peak performance. As it turns out (and with plenty of help from this forum), I have traced the issue back to the SW Power-Cap engaging. The problem being:

  1. The card is not overheating, it’s only at about 70C.
  2. The card is not even drawing too much power. The Power-Cap is activating at ~90W.
    Further, the Power cap is rapidly fluctuating: the card is in power mode 0 for 1 second, then power mode 8 for 1 second, power mode 0 for 1 second etc etc.
==============NVSMI LOG==============

Timestamp                       : Thu Jan 31 08:13:01 2013
Driver Version                  : 310.90

Attached GPUs                   : 4
GPU 0000:02:00.0
    Product Name                : Tesla K20Xm
    Display Mode                : Disabled
    Persistence Mode            : N/A
    Driver Model
        Current                 : TCC
        Pending                 : TCC
    Serial Number               : 0324912021549
    GPU UUID                    : GPU-42b69f80-9af6-d385-18fd-f4fc8d55daff
    VBIOS Version               : 80.10.17.00.02
    Inforom Version
        Image Version           : 2081.0200.01.09
        OEM Object              : 1.1
        ECC Object              : 3.0
        Power Management Object : N/A
    GPU Operation Mode
        Current                 : Compute
        Pending                 : Compute
    PCI
        Bus                     : 0x02
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x102110DE
        Bus Id                  : 0000:02:00.0
        Sub System Id           : 0x097D10DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 2
            Link Width
                Max             : 16x
                Current         : 8x
    Fan Speed                   : N/A
    Performance State           : P0
    Clocks Throttle Reasons
        Idle                    : Not Active
        User Defined Clocks     : Active
        SW Power Cap            : Not Active
        HW Slowdown             : Not Active
        Unknown                 : Not Active
    Memory Usage
        Total                   : 5759 MB
        Used                    : 2246 MB
        Free                    : 3513 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 0 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Disabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
        Aggregate
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
    Temperature
        Gpu                     : 73 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 56.89 W
        Power Limit             : 235.00 W
        Default Power Limit     : 235.00 W
        Min Power Limit         : 150.00 W
        Max Power Limit         : 235.00 W
    Clocks
        Graphics                : 732 MHz
        SM                      : 732 MHz
        Memory                  : 2600 MHz
    Applications Clocks
        Graphics                : 732 MHz
        Memory                  : 2600 MHz
    Max Clocks
        Graphics                : 784 MHz
        SM                      : 784 MHz
        Memory                  : 2600 MHz
    Compute Processes
        Process ID              : 5860
            Name                : C:\Program Files\MATLAB\R2012b\bin\win64\MATLA
B.exe
            Used GPU Memory     : 2229 MB
==============NVSMI LOG==============

Timestamp                       : Thu Jan 31 08:23:28 2013
Driver Version                  : 310.90

Attached GPUs                   : 4
GPU 0000:02:00.0
    Product Name                : Tesla K20Xm
    Display Mode                : Disabled
    Persistence Mode            : N/A
    Driver Model
        Current                 : TCC
        Pending                 : TCC
    Serial Number               : 0324912021549
    GPU UUID                    : GPU-42b69f80-9af6-d385-18fd-f4fc8d55daff
    VBIOS Version               : 80.10.17.00.02
    Inforom Version
        Image Version           : 2081.0200.01.09
        OEM Object              : 1.1
        ECC Object              : 3.0
        Power Management Object : N/A
    GPU Operation Mode
        Current                 : Compute
        Pending                 : Compute
    PCI
        Bus                     : 0x02
        Device                  : 0x00
        Domain                  : 0x0000
        Device Id               : 0x102110DE
        Bus Id                  : 0000:02:00.0
        Sub System Id           : 0x097D10DE
        GPU Link Info
            PCIe Generation
                Max             : 2
                Current         : 1
            Link Width
                Max             : 16x
                Current         : 8x
    Fan Speed                   : N/A
    Performance State           : P8
    Clocks Throttle Reasons
        Idle                    : Not Active
        User Defined Clocks     : Not Active
        SW Power Cap            : Active
        HW Slowdown             : Not Active
        Unknown                 : Not Active
    Memory Usage
        Total                   : 5759 MB
        Used                    : 2732 MB
        Free                    : 3027 MB
    Compute Mode                : Default
    Utilization
        Gpu                     : 99 %
        Memory                  : 100 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Disabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
        Aggregate
            Single Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
            Double Bit
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Texture Memory  : 0
                Total           : 0
    Temperature
        Gpu                     : 78 C
    Power Readings
        Power Management        : Supported
        Power Draw              : 107.21 W
        Power Limit             : 235.00 W
        Default Power Limit     : 235.00 W
        Min Power Limit         : 150.00 W
        Max Power Limit         : 235.00 W
    Clocks
        Graphics                : 732 MHz
        SM                      : 732 MHz
        Memory                  : 324 MHz
    Applications Clocks
        Graphics                : 732 MHz
        Memory                  : 2600 MHz
    Max Clocks
        Graphics                : 784 MHz
        SM                      : 784 MHz
        Memory                  : 2600 MHz
    Compute Processes
        Process ID              : 5860
            Name                : C:\Program Files\MATLAB\R2012b\bin\win64\MATLAB.exe
            Used GPU Memory     : 2715 MB

Any help would be greatly appreciated.

The nvidai-smi output looks a little suspect.

I’ve set a private message with my contact info so we can pursue this further.

I am experiencing the same issue with my K20xm – can I ask what the resolution to this problem was? Thanks.

Bad news for you. Was a faulty card. :( The other option is it may be overheating.

Can I ask how you identified the card was faulty? Was it a case of replacing the card and noticing the problem go away or was there a specific indicator of a faulty card? Thanks.

I had it RMA’d on suspicion of fault, then replacing it made th eproblem go away