MX150 graphics clock suddenly stuck at 427Mhz (with drivers 390.87 and 430.26)

Hi,

I’m running SuSE Linux Enterprise 15 on my Lenovo T580. It has an optimus card with nvidia MX150. I started with driver 390.87 and got everything working via nvidia + modeset config and “xrandr --setprovideroutputsource modesetting NVIDIA-0;xrandr --auto”.

While running on AC power I set performance mode to maximum in nvidia-settings, ran “glmark2 --fullscreen” and came up with a score of about 2000. So far so good.

Now sth. has happened during the last two months that has cut the glmark2 core down by more tha 50% (score 900). I haven’t changed anything in my X config or the nvidia driver, but just comparing the first two benchmarks shows the difference (I recorded the output from the initial run):

Two months ago:

glmark2 2017.07

=======================================================
OpenGL Information
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce MX150/PCIe/SSE2
GL_VERSION: 4.6.0 NVIDIA 390.87

[build] use-vbo=false: FPS: 1990 FrameTime: 0.503 ms
[build] use-vbo=true: FPS: 3025 FrameTime: 0.331 ms

Today:

glmark2 2017.07

=======================================================
OpenGL Information
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce MX150/PCIe/SSE2
GL_VERSION: 4.6.0 NVIDIA 390.87

[build] use-vbo=false: FPS: 1236 FrameTime: 0.809 ms
[build] use-vbo=true: FPS: 1336 FrameTime: 0.749 ms

I reinstalled drivers and kernel, rebooted, also tried the G05 driver for SLED 15, i.e.nivida driver 430.26, but the benchmark results are the same. There is no composition manager running (which would indeed cut the performance by 50%), I tried as root with “startx” and as user with login via gdm, using Gnome, KDE and basic WMs like twm and fvvm2. All the same.

What I see in nvidia-settings is the graphics clock never goes above 427 Mhz although it’s running in “performance level 3” with graphics clock max = 1911Mhz. In idle mode it’s at 370 Mhz, with glmark2 running it goes up to 427Mhz. “Preferred Mode” is at “Prefer maximum Performance” all the time. Seems to be the same problem as in https://forum.manjaro.org/t/nvidia-card-stuck-at-low-clock-frequency-405mhz/85838. This worked fine initially because I remember how I was watching the clock going up and down in adaptive mode and wondered how it jumped to almost maximum clock when just moving some windows around on the desktop.

In Windows 10 the problem doesn’t exist, the clock jumps up to about 1500Mhz when starting some 3D programs, so it’s a linux driver problem.

Using “PowerMizerEnable=0x1; PerfLevelSrc=0x3322; PowerMizerDefaultAC=0x1” in xorg.conf doesn’t help. nvidia-smi shows all “Clocks Throttle Reasons” as “Not Active”, so I’ve no idea what makes the GPU clock stay down. Below is the output of “nvidia-smi -q”.

Does anyone have any idea what could lock the GPU clock at this low speed?

cu,
Frank

==============NVSMI LOG==============

Timestamp                           : Sun Sep  8 23:16:25 2019
Driver Version                      : 430.26
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:02:00.0
    Product Name                    : GeForce MX150
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-742a84c2-d7f4-b399-d405-e97cabc40969
    Minor Number                    : 0
    VBIOS Version                   : 86.08.28.00.64
    MultiGPU Board                  : No
    Board ID                        : 0x200
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x02
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1D1010DE
        Bus Id                      : 00000000:02:00.0
        Sub System Id               : 0x00000000
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 4x
                Current             : 4x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 765000 KB/s
        Rx Throughput               : 12000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 2002 MiB
        Used                        : 65 MiB
        Free                        : 1937 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 100 %
        Memory                      : 40 %
        Encoder                     : N/A
        Decoder                     : N/A
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 58 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 97 C
        GPU Max Operating Temp      : 94 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : N/A
        Power Draw                  : N/A
        Power Limit                 : N/A
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 420 MHz
        SM                          : 420 MHz
        Memory                      : 3003 MHz
        Video                       : 1506 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1911 MHz
        SM                          : 1911 MHz
        Memory                      : 3004 MHz
        Video                       : 1708 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 26365
            Type                    : G
            Name                    : X
            Used GPU Memory         : 61 MiB
        Process ID                  : 28572
            Type                    : G
            Name                    : glmark2
            Used GPU Memory         : 3 MiB

Did you upgrade the bios in the meantime? Windows 10 often does this with regular updates.

It shouldn’t, as far as I could see, but to make sure I downgraded the bios to the previous and pre-previous version (which was the one I got the laptop delivered with), but with no change.

If the bios changed sth., shouldn’t then the problem exist in Windows, too?

Not necessarily, Windows also runs on broken bioses.
Doesn’t seem to be the case here, though.
When running nvidia-settings, does it display “Power Source: AC” in the powermizer pane?
Does removing the battery and power adapter, then pressing and holding the power button for 20 seconds, thus uncharging and resetting the mainboard, help?

Yes, the power source is AC, see here: https://www.bio.ifi.lmu.de/~steiner/nvidia-settings.jpg

The main battery is fixed inside the laptop, so I cannot remove it. I could unscrew the laptop and remove the cmos battery (as described in the thinkpad maintenance manual to remove the bios password), would that do what you mean?

No, don’t do that.
You could try resetting the bios to default, though I doubt that will help.

Another thing that no longer works is setting the power modes in xorg.conf. I just switched to battery again, and the graphics card is always in “performance level 2” (I have 0 - 3, see screenshot above). Even with

Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x1"

which should set the performance level to 0. There is no difference between PowerMizerDefault=0x3 and 0x1, it is always at performance level 2.

So it seems I cannot control the card in any way anymore.

I can try to let the primary (fixed) battery run down to 0% and then do the 20 secs button press you proposed.

Solved! Your idea with resetting the mainboard was correct! I planned to reset the bios and was looking through the options to remember what I changed when I detected the option “disable built-in battery”.
Without AC power attached this would disconnect the battery and shut the laptop down until AC was reconnected. I did that and in this powered-off state I pressed the power key for 20 seconds.

And after rebooting, I got the full functionality back! Graphics clock goes up to 1911Mhz during benchmarks, glmark2 score is > 2000 again and setting different profiles via xorg.conf works, too.

Thank you very very much for caring and giving the right idea! :)

Glad you’re back to normal operation. You should keep an eye on it, though. If this happens more frequently, it might point to the T580 being hit by a variant of an obscure bios bug of some P50 models. Simply put, with those, the bios sometimes fails to properly reset/initialize the gpu on boot under certain conditions. Windows seems to reset pci devices on boot per default so is not affected.
https://bugzilla.kernel.org/show_bug.cgi?id=203003

Thanks again! I’ll keep that in mind and maybe port the patch back to my SuSE kernel and for my device IDs.