Sovgrid.org My non-dev’s engineering log on DGX Spark

Non-developer’s DGX Spark engineering log: GB10, Mistral Small 4, SGLang, MCP.

Quick honesty up front: I’m not a developer (yet). Got the GB10 / 128 GB Spark last month and figured the fastest way to learn was to build in public and write down what broke. Two months in: 44 and counting articles, an MCP server, and a hybrid public stack, mostly with Claude Code and Mistral Small 4 119B as pair-programmers.

Why I’m posting this here: most “AI on a desk” guides assume x86_64 plus datacenter GPUs, which is frustrating after you just paid for a Spark. ARMv9.2-A + SM121A breaks half of them, and as a beginner I had to write down every gotcha that cost me days. Maybe useful if you just unboxed yours.

What’s actually documented:

  • SGLang on GB10 with Mistral Small 4 119B NVFP4 + EAGLE: every flag that matters, the flashinfer vs triton tradeoff (flashinfer OOMs on first batch, took me an embarrassingly long time to figure out why), mem-fraction tuning, the missing config.json / tokenizer.json gotcha, why nightly builds matter.
  • ComfyUI + FLUX.1-schnell: sequential workflow that fits Mistral and FLUX into 128 GB unified memory without OOMs.
  • OpenHands + Mistral fix: alternating-roles BadRequest with SGLang, one config flag fixes it.
  • Hybrid public stack: Spark stays local for inference, a 2 GB VPS handles HTTPS + MCP. Paid in BTC, no KYC. Total infra cost: 13 EUR/month.

Why SGLang and not vLLM (for Mistral): I run both on the Spark, vLLM serves Qwen3-Coder for IDE autocomplete on port 8000. For Mistral Small 4 119B in NVFP4 specifically, SGLang won on three points: nightly-dev-cu13 had the SM 12.1 fix for GB10 earlier (stable images crash with “invalid device ordinal”), EAGLE Spec V2 with Overlap-Scheduler gives ~93% more throughput on this hardware (59 to 93 tok/s steady-state), and the NVFP4 checkpoint loaded cleanly once I patched config.json in. vLLM works fine on the Spark, just isn’t the right tool for this specific model + quantization + hardware combo right now.

The site also exposes an MCP server so agents (or your local Claude Code / Cursor) can query the blog directly:

If you only try one tool, make it diagnose_sglang. Paste your config and error message, get back the known issue and a link to the article that fixes it. Built specifically for GB10 / SM121A / DGX Spark.

Caveat I want to be upfront about: I’m learning as I go. If you spot a config error or a better approach, please tell me. The whole point of writing this in public is to get corrections from people who actually know what they’re doing.

Feedback / corrections welcome :-)

thank you sir, this is very inspiring. I am working also on projects which must be sovereign (EU sovereign in particular) and I will most probably grab some of your great work ;) Thanks man !