DGX Spark Performance Degradation - GPU Power Draw Issue

I’m not sure this is the same issue as in this thread, but I’ll err on the side that it’s relevant instead of creating another thread.

My symptom is my power is clamped at ~40W even when GPU usage is 95%+ and grinding on a prompt. Models don’t seem to make a difference, and the GPU temp is normal - low even like < 60C. Unplugging, letting everything drain for minutes, etc., doesn’t seem to make a difference.

TL;DR I might have gotten a firmware mismatch. But even if that’s not the case, I’d like to get the SW power capping issue resolved.

Details:

I just got an ASUS GX10 last week (1TB model). I did all the updates including firmware updates using the prescribed methods - the control panel and standard apt update.

Here’s the output on firmware versions (I’ll come back to this in a minute)

    Firmware Component Name: FLASH
    Firmware Version: SBP:R:2.148.24
    Firmware ID: Not Specified

–
Firmware Component Name: UEFI
Firmware Version: ASUS_UEFI_0104
Firmware ID: Not Specified
Release Date: Not Specified

    Firmware Component Name: EC Firmware
    Firmware Version: 2.78.24
    Firmware ID: Not Specified

–
Firmware Component Name: PD Firmware
Firmware Version: PD0 FW1: 5.7, FW2: 5.7
Firmware ID: Not Specified
Release Date: Not Specified

    Firmware Component Name: PD Firmware
    Firmware Version: PD1 FW1: 5.7, FW2: 5.7
    Firmware ID: Not Specified
    Release Date: Not Specified


Unplugging and letting sit doesn’t help. So I dug in to figure out why. Right after boot, before I do anything really at all (except boot the machine) and already the power gets a SW clamp.

Immediately after boot I logged in and dumped info:

==============NVSMI LOG==============



Timestamp                                              : Fri May 22 02:26:40 2026

Driver Version                                         : 580.159.03

CUDA Version                                           : 13.0



Attached GPUs                                          : 1

GPU 0000000F:01:00.0

    Performance State                                  : P8

    Clocks Event Reasons

        Idle                                           : Not Active

        Applications Clocks Setting                    : Not Active

        SW Power Cap                                   : Not Active

        HW Slowdown                                    : Not Active

            HW Thermal Slowdown                        : Not Active

            HW Power Brake Slowdown                    : Not Active

        Sync Boost                                     : Not Active

        SW Thermal Slowdown                            : Not Active

        Display Clock Setting                          : Not Active

    Clocks Event Reasons Counters

        SW Power Capping                               : 2411654 us

        Sync Boost                                     : 0 us

        SW Thermal Slowdown                            : 0 us

        HW Thermal Slowdown                            : 0 us

        HW Power Braking                               : 0 us

    Sparse Operation Mode                              : N/A


Note that SW Power Cap is “Not Active” but SW Power Capping has been active for 2.4s I did nothing but do some cmdline stuff to try to figure out what was going on, but 2 minutes later (by timestamp) the Capping counter is going up and now SW Power Cap is “Active”. But there is literally nothing going on - there’s like a 10W power draw happening.

==============NVSMI LOG==============



Timestamp                                              : Fri May 22 02:28:14 2026

Driver Version                                         : 580.159.03

CUDA Version                                           : 13.0



Attached GPUs                                          : 1

GPU 0000000F:01:00.0

    Performance State                                  : P8

    Clocks Event Reasons

        Idle                                           : Not Active

        Applications Clocks Setting                    : Not Active

        SW Power Cap                                   : Active

        HW Slowdown                                    : Not Active

            HW Thermal Slowdown                        : Not Active

            HW Power Brake Slowdown                    : Not Active

        Sync Boost                                     : Not Active

        SW Thermal Slowdown                            : Not Active

        Display Clock Setting                          : Not Active

    Clocks Event Reasons Counters

        SW Power Capping                               : 37335609 us

        Sync Boost                                     : 0 us

        SW Thermal Slowdown                            : 0 us

        HW Thermal Slowdown                            : 0 us

        HW Power Braking                               : 0 us

    Sparse Operation Mode                              : N/A

I looked at POWER in the log ( nvidia-smi -q -d POWER ) and see this:

==============NVSMI LOG==============



Timestamp                                              : Fri May 22 02:19:19 2026

Driver Version                                         : 580.159.03

CUDA Version                                           : 13.0



Attached GPUs                                          : 1

GPU 0000000F:01:00.0

    GPU Power Readings

        Average Power Draw                             : 4.33 W

        Instantaneous Power Draw                       : 4.40 W

        Current Power Limit                            : N/A

        Requested Power Limit                          : N/A

        Default Power Limit                            : N/A

        Min Power Limit                                : N/A

        Max Power Limit                                : N/A

    Power Samples

        Duration                                       : Not Found

        Number of Samples                              : Not Found

        Max                                            : Not Found

        Min                                            : Not Found

        Avg                                            : Not Found

    GPU Memory Power Readings

        Average Power Draw                             : N/A

        Instantaneous Power Draw                       : N/A

    Module Power Readings

        Average Power Draw                             : N/A

        Instantaneous Power Draw                       : N/A

        Current Power Limit                            : N/A

        Requested Power Limit                          : N/A

        Default Power Limit                            : N/A

        Min Power Limit                                : N/A

        Max Power Limit                                : N/A

I’m guessing these should be populated? Is the SW clamping the power because it can’t read current/requested/min/max?

I double checked I had the latest FW installed and such using the CLI tools, and then I got curious and went over to see what ASUS had on their page.

There is one difference between ASUS’ latest package and what I have. ASUS’ firmware package has 104 for the UEFI BIOS but 2.78.18.3 for the EC. My machine has 104 for the UEFI BIOS, but 2.78.24 for the EC

Did I maybe end up with a firmware mismatch between the UEFI/BIOS and the EC so that the BIOS etc. can’t exchange info with the EC and that’s why the SW Power Limit comes on?

I don’t think this is unusual. I’ve seen around 35W usage with a single prompt, but if I run 10 concurrent prompts, it goes up closer to 100W. I think it’s possible for the GPU to be fully “utilitised” without necessarily drawing its maximum power.

Yes, that’s normal for me, sometimes a single prompt bursts, but it takes concurrent requests to max it out. I tend to watch temp rather tha W

@DannyTup @giles8 OK - thanks!

Intuitively I would think it would “throw everything at it” but this is a new machine for me and I definitely don’t know how power is managed. I’ll watch the temp and also compare tok/s with what other people tend to get on their models. Thanks again!

Started to happen on my second atom. Only a week in. I think oom causes it.

Would you mind, sharing recipes that work?

After experiencing constantly worse results than suggested on any recipes I’ve tried on my three MSI EdgExperts finally found this thread yesterday.

All three, bought weeks apart, were suffering from this issue. Not one of them would go above 7-11w regardless of how much load I put on them - running cold at around 42c too.

Yesterday I applied all current firmware updates and I’m pleased to say they all now scale up as expected. I was running two of them on heavy workloads and power consumption scaled right up.

The speed increase is crazy, I’ve clearly been putting up with terrible performance for however long the issue has been in place.

I didn’t need to unplug the power cable - was expecting to have to given other threads, but when they came back up after updates the speed was back to where it should be.

Monitoring but really hoping this is now fixed on my MSIs.

This might be expected if the firmware updates force a reset of the various components when the firmware is updated.

I haven’t been able to get a unit out of that broken throttled state no matter what. Rebooting and resetting the GPU didn’t help. The only fix is to cut power by unplugging the power brick or by removing the USB-C power cable from the unit.

This hardware gives off a half-baked experimental and not entirely functional vibe. The power related bugs, the USB-C ports which can’t work at full speed, throttling due to inadequate cooling and high idle power usage with connected QSFP56 cables are only a few of the issues. This hardware is really bad for the amounts they charge for it. Nvidia has done a poor job so far.