vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing?

sorry! My advice was poor!

Official docs are here: OS and Component Update Guide — DGX Spark User Guide

Could you try my repro docker or Eugr’s docker to compare against your setup to see if that brings you up to the expected performance?

For temps, there’s something built into the DGX Dashboard or you could go ‘grey beard’ on the command line with something like:

## one-shot detailed

nvidia-smi -q -d TEMPERATURE

## live refresh

watch -n 1 nvidia-smi

## tight CSV loop (nice for logs)

nvidia-smi --query-gpu=timestamp,temperature.gpu,fan.speed,power.draw,clocks.sm --format=csv -l 1

I use htop on the command line to look at processes and memory/gpu/cpu utilization.