It’s Gemma going to be capable of running on DGX Spark? Do you know about any recipe for previous versionss? [image] Welcome Gemma 4: Frontier multimodal intelligence on device We’re on a journey to advance and democratize artificial intelligence through open source and open science. …

Solved here: Gemma 4 Models - which vLLM version? Any PRs spotted? - #36 by eugr

Sure. Might take a few hours until support across all inference servers pop up. [image] Bringing AI Closer to the Edge and On-Device with Gemma 4 | NVIDIA Technical... The Gemmaverse expands with the launch of the latest Gemma 4 multimodal and multilingual models, designed to scale acro…

Awesome!!! Looks like gemma-4-31-b-thinking will achieve performance results that will be close to qwen3.5-397b-a17b and kimi-k2.5-thinking with minimal size. [image]

31B is a dense model. It will be not that fast on the Spark. I’m more interested in the 26B version - for the Spark at least.

That is correct. Lets focus on 26B version. Working on replicating your results with the info you shared on the other thread.

We have just updated our vLLM and llama.cpp playbooks to use Gemma4 on DGX Spark. Check them out here: VLLM , Llama.cpp

Very good, I see movement in these models, thank you.

docker run -it --gpus all -p 8000:8000 \ vllm/vllm-openai:gemma4-cu130 \ vllm serve ${HF_MODEL_HANDLE} This command didn’t seem to work for me, I got: danny@toad:~$ docker logs gemma4 -f …

I hope they release bigger versions, because it looks promising for desktop cards, but underwhelming for the Spark if you’re looking for a model with intelligence scores closer to Qwen3.5-122B.

I have tool call failling hard with gemma-4-26B-A4B-it. I guess it’s chat template or tool parsing problem. I used vllm arm official image for gemma-4 by the way. Got it right, claude set it up. Now my hermes working fine… Before no tool call was working due to the new tool call parsing : ▶ solution …

Google Gemma 4 - It will work on DGX Spark?

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10

cosinus April 2, 2026, 5:36pm 2

Sure. Might take a few hours until support across all inference servers pop up.

I already tried vLLM with latest Transformers v5.5.0 (which is required), but I failed:

llama.cpp has added support already:

Topic		Replies	Views
Gemma 4 Models - which vLLM version? Any PRs spotted? DGX Spark / GB10 nim , llama	177	8294	April 16, 2026
Gemma 4 -- here we go again DGX Spark / GB10	11	2514	April 15, 2026
Does anyone have Gemma 4 31B running on Spark DGX? DGX Spark / GB10	8	1939	April 9, 2026
Gemma 4 31B on DGX Spark: Runtime FP8 Benchmarks — Single & Dual Node (TP=2) DGX Spark / GB10 llama , agentic-ai	0	1212	April 7, 2026
How to run Gemma-4-NVFP4 in vLLM Docker? DGX Spark / GB10	11	3320	April 12, 2026
Gemma 4 Day-1 Inference on NVIDIA DGX Spark — Preliminary Benchmarks DGX Spark / GB10 llama , agentic-ai	17	5900	April 7, 2026
"vLLM + Gemma 4 on NVIDIA DGX Spark GB10" - has anyone testing this implementation? DGX Spark / GB10	0	252	April 7, 2026
Someone post this: Gemma 4 26B-A4B MoE running at 45-60 tok/s on DGX Spark DGX Spark / GB10	4	1989	April 5, 2026
[Guide] Uncensored Gemma-4-26B at 45 tok/s on DGX Spark — Actually Feels Great to Use! DGX Spark / GB10 Projects openclaw	7	1698	April 19, 2026
46tok/s with RedHatAI/gemma-4-26B-A4B-it-NVFP4 DGX Spark / GB10 llama	16	840	April 12, 2026

Google Gemma 4 - It will work on DGX Spark?

Related topics