Open-source recipe + scaffold: training a DSpark-class speculative-decoding draft for Nemotron

DeepSeek’s new DSpark / DeepSpec trainer ships with target support for Qwen3 and Gemma only. A lot of us here run Nemotron on Spark, so I built the missing piece and open-sourced it.

Disclosure up front: I’m Rhyan Neble, founder of Extended Systems Intelligence (XSI). We do most of our Nemotron work on DGX Spark, and this came out of it.

Repo (Apache-2.0): GitHub - Extended-Systems-Intelligence/nemotron-dspark-recipe: Community recipe + reference scaffold for training a DSpark-class speculative-decoding draft model for NVIDIA Nemotron, extending DeepSeek's DeepSpec. · GitHub

What it gives you:

  • The four DeepSpec extension points wired for a Nemotron target — a chat template, a draft-config builder that maps Nemotron’s transformer dimensions into the draft, a NemotronDSparkTrainer, and a worked config.
  • A step-by-step recipe: data prep → target cache → draft training → eval → serving, with notes for NVIDIA cloud GPUs.
  • Field notes on the Nemotron-H hybrid Mamba-Transformer checkpoints — where the stock Hugging Face generate/cache path gets in the way of hidden-state extraction during the cache stage, and the use_cache=False + output_hidden_states path that works. That’s the part that cost us time; it’s written down so it doesn’t cost you any.
  • A no-GPU selftest.py to catch integration breakage before a long run.

What it’s not: a benchmarked checkpoint. The scaffold is written against DeepSpec’s real interfaces but hasn’t been trained end-to-end and tuned yet — it’s a starting point. If you train a draft with it on Spark, I’d like to see your accepted-length numbers and target_layer_ids — open an issue or reply here.

Hi Rhyan,

Thanks for sharing

Have you tried to use it? What inference engine are you using and is there any benefits vs EAGLE drafter?

I’m currently training on my Sparks. I’m hoping to post my own numbers later this week. Once I get it trained up, I will share it along with any field notes or changes to the scaffolding.

DeepSeek reports DSpark accepting +26.7–30.9% more tokens than EAGLE-3 — but that’s their number on Qwen3, and it’s the accepted length, not end-to-end tok/s. Promising on paper; needs independent benchmarking. Our recipe isn’t a bet against EAGLE. DeepSpec — the framework underneath — ships EAGLE-3 and DSpark, and the Nemotron extension is the same four touchpoints either way. So the recipe can train an EAGLE-3 draft for Nemotron too (we did DSpark first). I’ll add a NemotronEagle3Trainer to the repo if you like; just let me know if it would help. Either way, I will train and benchmark both as soon as I can, so stay tuned.