Sovgrid.org My non-dev’s engineering log on DGX Spark

Cipherfox · April 29, 2026, 2:44pm

Non-developer’s DGX Spark engineering log: GB10, Mistral Small 4, SGLang, MCP.

Quick honesty up front: I’m not a developer (yet). Got the GB10 / 128 GB Spark last month and figured the fastest way to learn was to build in public and write down what broke. Two months in: 44 and counting articles, an MCP server, and a hybrid public stack, mostly with Claude Code and Mistral Small 4 119B as pair-programmers.

Why I’m posting this here: most “AI on a desk” guides assume x86_64 plus datacenter GPUs, which is frustrating after you just paid for a Spark. ARMv9.2-A + SM121A breaks half of them, and as a beginner I had to write down every gotcha that cost me days. Maybe useful if you just unboxed yours.

What’s actually documented:

SGLang on GB10 with Mistral Small 4 119B NVFP4 + EAGLE: every flag that matters, the flashinfer vs triton tradeoff (flashinfer OOMs on first batch, took me an embarrassingly long time to figure out why), mem-fraction tuning, the missing config.json / tokenizer.json gotcha, why nightly builds matter.
ComfyUI + FLUX.1-schnell: sequential workflow that fits Mistral and FLUX into 128 GB unified memory without OOMs.
OpenHands + Mistral fix: alternating-roles BadRequest with SGLang, one config flag fixes it.
Hybrid public stack: Spark stays local for inference, a 2 GB VPS handles HTTPS + MCP. Paid in BTC, no KYC. Total infra cost: 13 EUR/month.

Why SGLang and not vLLM (for Mistral): I run both on the Spark, vLLM serves Qwen3-Coder for IDE autocomplete on port 8000. For Mistral Small 4 119B in NVFP4 specifically, SGLang won on three points: nightly-dev-cu13 had the SM 12.1 fix for GB10 earlier (stable images crash with “invalid device ordinal”), EAGLE Spec V2 with Overlap-Scheduler gives ~93% more throughput on this hardware (59 to 93 tok/s steady-state), and the NVFP4 checkpoint loaded cleanly once I patched config.json in. vLLM works fine on the Spark, just isn’t the right tool for this specific model + quantization + hardware combo right now.

The site also exposes an MCP server so agents (or your local Claude Code / Cursor) can query the blog directly:

Endpoint: https://mcp.sovgrid.org/self-hosted-ai
Discovery: https://sovgrid.org/.well-known/ai-plugin.json
Tools: search_blog · get_article · diagnose_sglang
Concept page: https://mcp.sovgrid.org/

If you only try one tool, make it diagnose_sglang. Paste your config and error message, get back the known issue and a link to the article that fixes it. Built specifically for GB10 / SM121A / DGX Spark.

Caveat I want to be upfront about: I’m learning as I go. If you spot a config error or a better approach, please tell me. The whole point of writing this in public is to get corrections from people who actually know what they’re doing.

Visit: https://sovgrid.org
Start here if SGLang is broken: Self-Host Mistral Small 4 with SGLang on NVIDIA DGX Spark (GB10): What Actually Works | Sovereign AI Blog
Agent index: https://sovgrid.org/llms.txt

Feedback / corrections welcome :-)

trithemius · May 1, 2026, 11:17am

thank you sir, this is very inspiring. I am working also on projects which must be sovereign (EU sovereign in particular) and I will most probably grab some of your great work ;) Thanks man !

Topic		Replies	Views
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding DGX Spark / GB10 Projects	39	2037	April 20, 2026
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	1924	December 7, 2025
Running Mistral Small 4 (119B MoE) on DGX Spark with SGLang — Full Setup & Benchmarks DGX Spark / GB10 agentic-ai	9	995	May 20, 2026
Running Mistral Small 4 119B NVFP4 on NVIDIA DGX Spark (GB10) DGX Spark / GB10 deepseek	65	4531	May 18, 2026
New pre-built sglang Docker Images for NVIDIA DGX Spark DGX Spark / GB10 Projects	27	2200	May 7, 2026
MiMo-V2.5 (New model) DGX Spark / GB10	51	4445	May 24, 2026
DGX Spark performance DGX Spark / GB10	50	5200	February 27, 2026
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	937	May 11, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4419	March 6, 2026
Run SGLang in Spark DGX Spark / GB10	20	2666	November 28, 2025

Sovgrid.org My non-dev’s engineering log on DGX Spark

Related topics