DeepSeek’s new DSpark / DeepSpec trainer ships with target support for Qwen3 and Gemma only. A lot of us here run Nemotron on Spark, so I built the missing piece and open-sourced it.
Disclosure up front: I’m Rhyan Neble, founder of Extended Systems Intelligence (XSI). We do most of our Nemotron work on DGX Spark, and this came out of it.
What it gives you:
- The four DeepSpec extension points wired for a Nemotron target — a chat template, a draft-config builder that maps Nemotron’s transformer dimensions into the draft, a
NemotronDSparkTrainer, and a worked config. - A step-by-step recipe: data prep → target cache → draft training → eval → serving, with notes for NVIDIA cloud GPUs.
- Field notes on the Nemotron-H hybrid Mamba-Transformer checkpoints — where the stock Hugging Face generate/cache path gets in the way of hidden-state extraction during the cache stage, and the
use_cache=False+output_hidden_statespath that works. That’s the part that cost us time; it’s written down so it doesn’t cost you any. - A no-GPU
selftest.pyto catch integration breakage before a long run.
What it’s not: a benchmarked checkpoint. The scaffold is written against DeepSpec’s real interfaces but hasn’t been trained end-to-end and tuned yet — it’s a starting point. If you train a draft with it on Spark, I’d like to see your accepted-length numbers and target_layer_ids — open an issue or reply here.