DGX Spark PyTorch LLM training throughput up to 8x slower than expected

shadowlilac · January 18, 2026, 7:14pm

I’ve been trying to reproduce the performance of the DGX Spark advertised here, but i can’t reproduce them on neither the 3B nor the 8B LLMs with this exact guide. Instead of the ~80k tokens per second throughput, i get ~11k tokens per second on the 3B and instead of 54k~ tokens per second, I get 9k tokens per second on the 8B model. I have written my current observations here in greater detail with also different backends like unsloth and different model sizes. I’m willing to retract my benchmarks, but the effective reality is that currently the DGX spark, at least for me, is significantly slower than expected.

aniculescu · February 10, 2026, 10:09pm

Thank you for the details. I have passed this along to engineering for them to look at.

aniculescu · April 2, 2026, 6:56pm

Please check out our benchmarking guide for benchmarking different models with different backends: DGX Spark Performance FAQ

Topic		Replies	Views
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4048	March 6, 2026
DGX Spark performance DGX Spark / GB10	50	4428	February 27, 2026
Dgx spark benchmark performance DGX Spark / GB10	17	1990	January 4, 2026
TensorRT-LLM + nvidia/Llama-3.3-70B-Instruct-NVFP4 = 5 tok/s DGX Spark / GB10 llama	4	602	February 1, 2026
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	27	2575	March 26, 2026
Question on Inference Performance Results of Qwen3 235B A22B on 2× DGX Spark DGX Spark / GB10 cuda	5	709	December 19, 2025
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	145	6991	March 28, 2026
6x Spark setup DGX Spark / GB10	112	8585	April 25, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	9436	April 9, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1659	February 13, 2026

DGX Spark PyTorch LLM training throughput up to 8x slower than expected

Related topics