Slow inference with 31b model Gemma 4? Optimizations?

0rand · June 10, 2026, 11:24am

You prob should have researched and never got them in the first place. If you can host and afford a rack server on industrial level you shouldn’t have considered sparks as they are home/dev boxes for individuals. I can put my two sparks in backpack and work and travel on the airplane. Even 4 will do. You can’t put your server in. That’s the whole difference, mate. But at the scale of 4+ sparks the economy stop making any sense. Just rent a cluster on Lambda or Runpod and run your tasks. You don’t need it consistently for 24x7 and you pay by the minute. Spark is for enthusiasts and tinkerers, far from plug-and-play corporate, who wants to cheap out on boxes.

co-le · June 11, 2026, 4:26pm

Check if you’re not suffering from the power delivery bug, that seems extra low.

On 2x Spark I recommend two picks: Minimax M2.7 AWQ and DeepSeek 4 Flash (FP8/4 straight from DeepSeek)

I’m running the latter right now with 500K context (very comfy) and with proper config (MTP) it does 40 tps sustained for the whole context length. I showed my recipe here

Topic		Replies	Views
Gemma 4 Day-1 Inference on NVIDIA DGX Spark — Preliminary Benchmarks DGX Spark / GB10 llama , agentic-ai	17	8477	April 7, 2026
Google Gemma 4 - It will work on DGX Spark? DGX Spark / GB10 agentic-ai	22	2616	April 5, 2026
Gemma 4 31B on DGX Spark: Runtime FP8 Benchmarks — Single & Dual Node (TP=2) DGX Spark / GB10 llama , agentic-ai	0	2492	April 7, 2026
DGX Spark performance DGX Spark / GB10	49	5850	February 13, 2026
[Guide] Uncensored Gemma-4-26B at 45 tok/s on DGX Spark — Actually Feels Great to Use! DGX Spark / GB10 Projects openclaw	9	3983	April 20, 2026
Gemma 4 Models - which vLLM version? Any PRs spotted? DGX Spark / GB10 nim , llama	177	11604	April 16, 2026
Does anyone have Gemma 4 31B running on Spark DGX? DGX Spark / GB10	8	2885	April 9, 2026
Someone post this: Gemma 4 26B-A4B MoE running at 45-60 tok/s on DGX Spark DGX Spark / GB10	4	2741	April 5, 2026
How to run GLM 4.7 on dual DGX Sparks with vLLM / mods support in spark-vllm-docker DGX Spark / GB10	27	4286	January 2, 2026
Gemma4 draft models are now available DGX Spark / GB10 Projects	8	3002	May 20, 2026

Slow inference with 31b model Gemma 4? Optimizations?

Related topics