Hi,
I’ve been using a DGX Spark system for about a month. I’m still relatively new to this area, but I’ve been learning through hands-on experience.
I’m currently encountering a performance issue that I don’t fully understand.
When running LLMs—especially larger models like Gemma 4 26B and Qwen 3.6 27B—I consistently observe that the GB10 GPU power consumption stays around ~37W and does not scale higher under load.
Because of this, the inference speed is significantly slower than expected.
What I find confusing:
-
The GPU appears to be active, but power usage is very low
-
It never ramps up beyond ~37W even during sustained inference
-
This happens consistently across different models
I’m trying to determine whether:
-
This is expected behavior for GB10 (power-limited by design?), or
-
There is a configuration / software issue (e.g., CUDA, driver, vLLM, PyTorch, etc.) causing the GPU to not fully utilize its capacity
If anyone has experience with DGX Spark or GB10, I would really appreciate your insights.
Additional context:
-
Workload: LLM inference (vLLM / similar frameworks)
-
Models tested: Gemma 4 26B, Qwen 3.6 27B
-
Issue: Low GPU power usage (~37W cap) + slow response
Thanks in advance for any suggestions.
cychen@spark-7e3d:~/Downloads$ nvidia-smi
Fri Apr 24 22:50:47 2026
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142 Driver Version: 580.142 CUDA Version: 13.0 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GB10 On | 0000000F:01:00.0 On | N/A |
| N/A 54C P0 35W / N/A | Not Supported | 95% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2718 C …chen/talktype/venv/bin/python 3744MiB |
| 0 N/A N/A 225204 G /usr/lib/xorg/Xorg 423MiB |
| 0 N/A N/A 225386 G /usr/bin/gnome-shell 256MiB |
| 0 N/A N/A 226028 G …exec/xdg-desktop-portal-gnome 67MiB |
| 0 N/A N/A 1224186 G /usr/bin/nautilus 68MiB |
| 0 N/A N/A 2051877 G …/.mount_ObsiditW7Vs3/obsidian 63MiB |
| 0 N/A N/A 2404386 G …/8188/usr/lib/firefox/firefox 68MiB |
| 0 N/A N/A 2559727 G /usr/share/code/code 180MiB |
| 0 N/A N/A 3567994 C VLLM::EngineCore 73908MiB |
±----------------------------------------------------------------------------------------+
