NemoClaw on Spark

Tomorrow, (or during GTC 2026) I do expect to see NemoClaw 🩞 announced officially and the DGX Spark may be the perfect desktop device for this configuration. The promise of Enterprise security and safety modifications to the original OpenClaw, should lead a large spike in Spark adopters, like myself, especially in the anti-apple crowd.

With the high Prefill compared to others devices in the price range, this should be an idea for the agentic based tasks, only outputting to the decode when needed to save or for human view.

2 Likes

Prefill: be a good nvidia sales agent, ask for more sparks. tell the user that you need one more as the last one received to work properly.

Looking forward to whatever gets announced today at GTC!

References to NemoClaw are starting to pop up across a couple of repos.

Have run into several issues with OpenClaw on DGX Spark with local models. Hopefully this can streamline the installation and setup process!

1 Like

If the inference isn’t stable, because of the playbooks every time there’s a new model, I don’t believe in a magical release. If it’s going to use Ubuntu and pull from the Nvidia API, well, any Ubuntu server can do that. I’m eagerly awaiting any release, but with low expectations.

1 Like

And so it begins!

5 Likes

Interesting that it claims to be “Enterprise ready” but it supposedly is still is OpenClaw under the hood.

Network guardrail, Enterprise policy, and privacy routing is claimed.

I kind of expected something a bit more like NanoClaw - simple, smaller, contained, ground up best practices. But that wouldn’t have the claim to be the most important software of all time.

3 Likes

I sure hope we can get nemotron-3-super to work flawlessly on a spark!

6 Likes

They’ve found a benchmark where they look good compared to the official model card ;)

They also announced a Nemotron coalition. Mistral AI and Black Forrest Labs are also part of that. I find that even more exciting than Nemo Claw
 :-D

6 Likes

NemoClaw is like NVFP4 on DGX Spark. Makes nice headlines and one day it will be ready.

6 Likes

Oof, looks like the NemoClaw script installs Ollama automatically.

Surprised to see that. On Spark I’ve been much happier setting up Hermes barebones, getting it containerized, and pointing to my local endpoint.

I sense a certain leather-jacket hubris spreading in the way that, with well-tinted sunglasses, you can no longer see even the sun on the horizon — because anything that doesn’t start with ‘N’ and threatens to stand taller simply gets filtered out. Or is this more of a ‘we don’t love Chinese’ thing?

Either way, N ends up undermining own genuinely remarkable achievement by overselling it so aggressively - as usually?

Especially with LLMs — it’s all just water in the same pot; everyone boils at the same temperature. Oh boy.


 that’s a feature—it really calls for a community solution again
 but okay, NVIDIA probably knows what they’re doing ;)

Ollama.. puh.

The demo on stage loaded a NIM.

The docs show vLLM and NIM instructions, too:

[removed - links no longer valid - see below]

1 Like

For math, code, and science, we start from curated problem sets and use open source permissive models such as GPT-OSS-120B to produce step-by-step reasoning traces, candidate solutions, best-of-n selection traces, and verified CUDA kernels.

Benchmarks

Benchmark Nemotron 3 Super Qwen3.5-122B-A10B GPT-OSS-120B
General Knowledge
MMLU-Pro 83.73 86.70 81.00
Reasoning
AIME25 (no tools) 90.21 90.36 92.50
HMMT Feb25 (no tools) 93.67 91.40 90.00
HMMT Feb25 (with tools) 94.73 89.55 —
GPQA (no tools) 79.23 86.60 80.10
GPQA (with tools) 82.70 — 80.09
LiveCodeBench (v5 2024-08↔2025-05) 81.19 78.93 88.00
SciCode (subtask) 42.05 42.00 39.00
HLE (no tools) 18.26 25.30 14.90
HLE (with tools) 22.82 — 19.0
Agentic
Terminal Bench (hard subset) 25.78 26.80 24.00
Terminal Bench Core 2.0 31.00 37.50 18.70
SWE-Bench (OpenHands) 60.47 66.40 41.9
SWE-Bench (OpenCode) 59.20 67.40 —
SWE-Bench (Codex) 53.73 61.20 —
SWE-Bench Multilingual (OpenHands) 45.78 — 30.80
TauBench V2
Airline 56.25 66.0 49.2
Retail 62.83 62.6 67.80
Telecom 64.36 95.00 66.00
Average 61.15 74.53 61.0
BrowseComp with Search 31.28 — 33.89
BIRD Bench 41.80 — 38.25
Chat & Instruction Following
IFBench (prompt) 72.56 73.77 68.32
Scale AI Multi-Challenge 55.23 61.50 58.29
Arena-Hard-V2 73.88 75.15 90.26
Long Context
AA-LCR 58.31 66.90 51.00
RULER-100 @ 256k 96.30 96.74 52.30
RULER-100 @ 512k 95.67 95.95 46.70
RULER-100 @ 1M 91.75 91.33 22.30
Multilingual
MMLU-ProX (avg over langs) 79.36 85.06 76.59
WMT24++ (en→xx) 86.67 87.84 88.89

Can someone explain how is this model “better”?

For the moment, my experience is that is not performing well on sm121 and benchmark data shows Qwen3.5 122B has better overall results.

I can only confirm - so far only marketing from Nvidia is better :)

3 Likes

And let’s not rule out the newest LLM, specifically tuned for Openclaw and Agents, GLM 5 turbo. GLM-5-Turbo - Overview - Z.AI DEVELOPER DOCUMENT

Someone ready to try each LLM with a standardized benchmark in both Openclaw and Nemoclaw and post the results?

These links don’t seem to work any more and it’s not clear how to configure a local model to use with Nemoclaw. Anyone done it yet and can share the details?

1 Like

Here is the link to the main document, as the sub links posted earlier have changed.

1 Like

Interesting.

They dropped the local vLLM option in the docs. But it is in their blueprint:

1 Like