vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing?

christopher_owen · January 31, 2026, 9:42am

sorry! My advice was poor!

Official docs are here: OS and Component Update Guide — DGX Spark User Guide

Could you try my repro docker or Eugr’s docker to compare against your setup to see if that brings you up to the expected performance?

For temps, there’s something built into the DGX Dashboard or you could go ‘grey beard’ on the command line with something like:

## one-shot detailed

nvidia-smi -q -d TEMPERATURE

## live refresh

watch -n 1 nvidia-smi

## tight CSV loop (nice for logs)

nvidia-smi --query-gpu=timestamp,temperature.gpu,fan.speed,power.draw,clocks.sm --format=csv -l 1

I use htop on the command line to look at processes and memory/gpu/cpu utilization.

Topic		Replies	Views
GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? DGX Spark / GB10	90	3940	February 27, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	2046	December 25, 2025
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	4545	December 9, 2025
Llama.cpp experimental native mxfp4 support for blackwell PR DGX Spark / GB10 llama	13	1195	January 7, 2026
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	144	4969	March 14, 2026
Your GPU does not have native support for FP4 computation but FP4 quantization is being used DGX Spark / GB10	5	1068	January 30, 2026
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	1445	December 7, 2025
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	3325	March 16, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	2363	December 31, 2025
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	929	February 13, 2026

vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing?

Related topics