Non-developer’s DGX Spark engineering log: GB10, Mistral Small 4, SGLang, MCP.
Quick honesty up front: I’m not a developer (yet). Got the GB10 / 128 GB Spark last month and figured the fastest way to learn was to build in public and write down what broke. Two months in: 44 and counting articles, an MCP server, and a hybrid public stack, mostly with Claude Code and Mistral Small 4 119B as pair-programmers.
Why I’m posting this here: most “AI on a desk” guides assume x86_64 plus datacenter GPUs, which is frustrating after you just paid for a Spark. ARMv9.2-A + SM121A breaks half of them, and as a beginner I had to write down every gotcha that cost me days. Maybe useful if you just unboxed yours.
What’s actually documented:
- SGLang on GB10 with Mistral Small 4 119B NVFP4 + EAGLE: every flag that matters, the flashinfer vs triton tradeoff (flashinfer OOMs on first batch, took me an embarrassingly long time to figure out why), mem-fraction tuning, the missing config.json / tokenizer.json gotcha, why nightly builds matter.
- ComfyUI + FLUX.1-schnell: sequential workflow that fits Mistral and FLUX into 128 GB unified memory without OOMs.
- OpenHands + Mistral fix: alternating-roles BadRequest with SGLang, one config flag fixes it.
- Hybrid public stack: Spark stays local for inference, a 2 GB VPS handles HTTPS + MCP. Paid in BTC, no KYC. Total infra cost: 13 EUR/month.
Why SGLang and not vLLM (for Mistral): I run both on the Spark, vLLM serves Qwen3-Coder for IDE autocomplete on port 8000. For Mistral Small 4 119B in NVFP4 specifically, SGLang won on three points: nightly-dev-cu13 had the SM 12.1 fix for GB10 earlier (stable images crash with “invalid device ordinal”), EAGLE Spec V2 with Overlap-Scheduler gives ~93% more throughput on this hardware (59 to 93 tok/s steady-state), and the NVFP4 checkpoint loaded cleanly once I patched config.json in. vLLM works fine on the Spark, just isn’t the right tool for this specific model + quantization + hardware combo right now.
The site also exposes an MCP server so agents (or your local Claude Code / Cursor) can query the blog directly:
- Endpoint: https://mcp.sovgrid.org/self-hosted-ai
- Discovery: https://sovgrid.org/.well-known/ai-plugin.json
- Tools: search_blog · get_article · diagnose_sglang
- Concept page: https://mcp.sovgrid.org/
If you only try one tool, make it diagnose_sglang. Paste your config and error message, get back the known issue and a link to the article that fixes it. Built specifically for GB10 / SM121A / DGX Spark.
Caveat I want to be upfront about: I’m learning as I go. If you spot a config error or a better approach, please tell me. The whole point of writing this in public is to get corrections from people who actually know what they’re doing.
- Visit: https://sovgrid.org
- Start here if SGLang is broken: Self-Host Mistral Small 4 with SGLang on NVIDIA DGX Spark (GB10): What Actually Works | Sovereign AI Blog
- Agent index: https://sovgrid.org/llms.txt
Feedback / corrections welcome :-)