[Guide] Uncensored Gemma-4-26B at 45 tok/s on DGX Spark — Actually Feels Great to Use!

user99333 · April 13, 2026, 7:39pm

Hey DGX Spark community! 👋

I’ve been experimenting with LLM inference on my DGX Spark and found a setup that not only gets 45+ tokens/second but actually feels great to use day-to-day.

GitHub Repo: GitHub - ZengboJamesWang/dgx-spark-vllm-gemma4-26b-uncensored: High-performance uncensored Gemma-4-26B inference on NVIDIA DGX Spark using vLLM - 45+ tok/s · GitHub

🚀 What Makes This Special: UNCENSORED + FAST

UNCENSORED — No Filtered Responses

This is the AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4 model. It’s completely uncensored — no alignment filtering, no refusals, no “I cannot help with that” walls. It responds directly and honestly without the typical guardrails. This is genuinely refreshing if you’re tired of models that over-refuse or give sanitized answers.

BLAZING FAST with OpenClaw

When paired with OpenClaw, this setup feels incredibly responsive:

Responses stream in smoothly without lag
Long outputs finish quickly
The typing experience is fluid and satisfying

It doesn’t feel like you’re waiting for a model — it feels like a tool that keeps up with you. Very good feeling overall!

Performance Comparison

Tested on DGX Spark with max_tokens=200, warmup excluded:

Setup	Model	Speed	Memory
This Setup ✅	Gemma-4-26B Uncensored NVFP4 (MoE)	45.26 tok/s	~16.3 GB
vLLM LilaRest 31B	Gemma-4-31B NVFP4 (Dense)	9.16 tok/s	~18.5 GB
Ollama	gemma4:31b (Dense)	8.05 tok/s	~19 GB

Quick Start

git clone https://github.com/ZengboJamesWang/dgx-spark-vllm-gemma4-26b-uncensored.git
cd dgx-spark-vllm-gemma4-26b-uncensored
bash scripts/start.sh
bash scripts/benchmark.sh

Happy (uncensored) inferencing! 🚀

user68884 · April 15, 2026, 11:55am

Does not even run as configured, do not waste you time.

user99333 · April 15, 2026, 1:51pm

Please let me know what is the error, it works well on my DGX.

christian176 · April 15, 2026, 4:41pm

I can confirm this. I use this as a “smaller” agentic modell with vision capabilities. Works like a charm with 48.78 t/s. I am using this with eugr’s community spark-vllm more or less out of the box. Just take the recipe and change the model name (and adjust vram consumption e.g. 0.3 with 132K context size).

Topic		Replies	Views
Someone post this: Gemma 4 26B-A4B MoE running at 45-60 tok/s on DGX Spark DGX Spark / GB10	4	1842	April 5, 2026
Gemma 4 Day-1 Inference on NVIDIA DGX Spark — Preliminary Benchmarks DGX Spark / GB10 llama , agentic-ai	17	5478	April 7, 2026
Google Gemma 4 - It will work on DGX Spark? DGX Spark / GB10 agentic-ai	22	1968	April 5, 2026
Gemma 4 31B on DGX Spark: Runtime FP8 Benchmarks — Single & Dual Node (TP=2) DGX Spark / GB10 llama , agentic-ai	0	988	April 7, 2026
Gemma4 Benchmarks on double DGX Sparks Ray Cluster and single DGX DGX Spark / GB10 Projects	2	498	April 6, 2026
Gemma 4 Models - which vLLM version? Any PRs spotted? DGX Spark / GB10 nim , llama	171	7506	April 15, 2026
"vLLM + Gemma 4 on NVIDIA DGX Spark GB10" - has anyone testing this implementation? DGX Spark / GB10	0	218	April 7, 2026
DGX Spark performance DGX Spark / GB10	50	3975	February 27, 2026
Guide: Gemma 4 31B on DGX Spark via NemoClaw — Dual-Model Setup Guide DGX Spark / GB10 Projects nim , llama , nemotron , nemoclaw , openclaw	3	982	April 10, 2026
Does anyone have Gemma 4 31B running on Spark DGX? DGX Spark / GB10	8	1774	April 9, 2026