I am experiencing a significant performance regression on my DGX Spark system that has persisted for the last week, impacting both LLM inference tasks (LM Studio, llama.cpp) and PyTorch workloads. I previously had stable performance with no issues for a few months.
Issue Summary:
LLM inference tasks now produce approximately half the tokens per second compared to before (using identical models and configurations).
PyTorch inference tasks are taking about twice as long as before to complete.
These performance regressions are reproducible across multiple runs.
Observed Behavior:
Under full GPU workload, nvidia-smi reports ~96% GPU utilization.
However, GPU power draw does not exceed approximately 14W.
Previously, under similar workloads, the GPU would draw noticeably higher power than 14W.
At idle, the system consistently draws approximately 5W.
The low power draw, despite high GPU utilization, suggests that the GPU may not be entering the expected performance state during intensive workloads.
Thanks @christian.pappert for sharing your observations and linking your thread. That helped confirm I wasn’t the only one seeing this.
I was able to resolve the issue on my end with a full power cycle. Unplugging the USB C connector from the DGX Spark itself did not fix it, but completely unplugging the power supply from the wall outlet for a minute or so and then plugging it back in restored normal behavior.
After reconnecting, GPU power draw under full workload returned to expected levels and performance has returned to normal.
For reference, I was using the original power supply that shipped with the unit.
I still don’t know what originally caused it or how to reproduce it, but the full AC disconnect resolved it immediately.
Hopefully this helps anyone else who runs into the same issue.
using everything out of the box and seem to be getting caught in a low power mode due to the mailbox not loading on startup. not sure what that error means but here’s the bug report.
I also had no problem with my 3 MSI EdgeXperts, but I had an issue where the DGX Spark was very slow. I completely unplugged the power adapter from the outlet for about a minute and then plugged it back in, and the DGX Spark’s performance returned to normal. I think there’s something wrong. I noticed that the DGX Spark’s performance was particularly poor when I built the llama.cpp source code. I thought it was strange that the 3 MSI EdgeXperts finished building quickly, but the DGX Spark was building at a much, much slower speed.
I’ve read the latest firmware might solve some issues, but also introduces some of its own, most notably on the ConnectX-7 interfaces where a massive performance drop has been observed. Did you observe such issues on your end?
Got hit with this today… as far as I can tell it happened after my spark crashed when loading to many models into memory. Performance went from 63 t/s on qwen3.5-35b-a3b down to 29 t/s. I can confirm that unplugging the power supply from the wall for a couple minutes resolved the issue.
Again today April 3rd, crash occurred using ComfyUI video upscaler, after reboot performance is terrible. Guess I will have to hook up a smart plug as a hacky workaround. Wish NVIDIA would fix this!!!
Thank you for this discussion. Unplugging it for a few min solved my issue. I had a similar hang and hard reboot later and experienced really slow performance (ollama qwen3.5:122b around 11tok/s), and after the reset, it’s back to 24+ tok/s as before. More people should know about this in case this happens to them (and maybe the devs can find a permanent solution) Thanks all!
My unit never crashed and I encountered this power issue just recently. I was getting 30 tok/s while other people was reporting 50 tok/s. ChatGPT suggested that unplug from power socket solution and my problem was solved. I wasted several day checking for missing patch/wrong setting. This is frustrating when hardware is unreliable.
I’ve encountered the same issue and reported it here GB10 is power limited after crash . The hardware seems to have a problem. Nvidia doesn’t seem to understand what’s going on. The alternative is that they’re aware and they’re trying to cover it up because they know it affects lots of units or all of them.
If we encounter this problem, are there serious hardware problems which will render these dgx spark clones useless? These aren’t particularly cheap.
I ran into the same issue on my ASUS Ascent DGX Spark CFF GX10 and it’s honestly pretty frustrating.
I spent several hours checking everything I could think of - drivers, firmware versions, even the Linux kernel - trying to figure out why performance suddenly dropped. Nothing looked wrong at all.
In the end, unplugging the power supply and plugging it back in fixed it immediately.
This has now happened to me 3 times within the last month. From what I’ve observed, it seems to occur after the system wakes up from sleep mode. Definitely feels like some kind of power or hardware state issue rather than software.
Would be great to get an official explanation or fix from NVIDIA, because this is really not something you’d expect at this price point.