Has anyone had any good experience running on DGX Spark with clawdbot?

Has anyone had any good experience running on DGX Spark with clawdbot?

I started to try it out on a local throw-away sandbox VM and using Minimax m2.1 on dual spark. I got it up and running but I didn’t too far with it, there seems like a lot of tweaking would be necessary (maybe it works better with Opus 4.5 - not sure).

Here is the clawdbot config that will save you a ton of time if you try it with Minimax - took me hours to figure out why I was getting tool call XML back as responses until I landed on this:

ā€œmodelsā€: {
ā€œmodeā€: ā€œmergeā€,
ā€œprovidersā€: {
ā€œlocalā€: {
ā€œbaseUrlā€: ā€œhttp://[server]:8443/v1ā€,
ā€œapiKeyā€: ā€œ[key]ā€,
ā€œapiā€: ā€œopenai-completionsā€,
ā€œmodelsā€: [
{
ā€œidā€: ā€œminimax-m2.1ā€,
ā€œnameā€: ā€œMiniMax M2.1ā€,
ā€œreasoningā€: true,
ā€œinputā€: [
ā€œtextā€
],
ā€œcostā€: {
ā€œinputā€: 0,
ā€œoutputā€: 0,
ā€œcacheReadā€: 0,
ā€œcacheWriteā€: 0
},
ā€œcontextWindowā€: 131072,
ā€œmaxTokensā€: 8192
}
]
}
}
},
ā€œagentsā€: {
ā€œdefaultsā€: {
ā€œmodelā€: {
ā€œprimaryā€: ā€œlocal/minimax-m2.1ā€
},
ā€œmodelsā€: {
ā€œlocal/minimax-m2.1ā€: {
ā€œaliasā€: ā€œMinimaxā€
}
},
ā€œworkspaceā€: ā€œ/home/agentuser/clawdā€,
ā€œcompactionā€: {
ā€œmodeā€: ā€œsafeguardā€
},
ā€œmaxConcurrentā€: 4,
ā€œsubagentsā€: {
ā€œmaxConcurrentā€: 8
}
}
},

hey i appreciate this! was wanting to put something together this weekend so this definitely kickstarts that. thank you!

I set it up and running for bear minimum functionality. Since I have this hardware, I focused on setting up using local LLM. The local LLM part works great so far! MiniMax-m2.1 is a cloud version. You would need to pay API fee even your usage doesn’t require a fancy model.

I was able to get it working with MiniMax m2.1 on DGX spark using the config above, but it does require a cluster of 2 using eugr’s vllm docker.

I’m also curious if anyone has done testing , I’ve been running the 4.7 flash - since thats what some generic research with claude suggested to do.

But compared to just running it with opus 4.5 you have to tame verbosity and issues with toolcalling. Its not optimal yet and the full ā€œofflineā€ experience is definately not yet comparable.

I’d be curious to hear suggestions or experience reports from others! I’ll continue to work on it for now.

Since I only have one DGX Spark, I’m using GLM 4.7 Flash Q4. Honestly, I’m quite impressed. I spent half a day letting it self-develop and add its own features:

  • Google Calendar management
  • System screenshots with automatic Slack delivery

What blew me away was that I didn’t have to intervene at all during the process (except for a one-time Google API authentication). I’m thrilled to see this kind of performance from a local model. One thing to note is that the permissions are quite excessive; it definitely needs a dedicated account and an isolated VM to keep my data secure.

I’m curious to know—what other models have you all been using to get great results?

@CopybaraS

Have you turned off hibernation before getting above successful story?

Yeah, hibernation is definitely disabled—on both the DGX Spark and the VM running OpenClaw. Stability is priority one for running agents.

Just to clarify so there’s no misunderstanding: it wasn’t a single-shot autonomous run. It actually took several rounds of prompting/dialogue to get there. However, the fact that I didn’t have to manually write a single line of code or tweak the tool integration myself was what really blew me away.

To be honest, compared to cloud-based SOTA models like Kimi k2.5, GLM 4.7 Flash’s reasoning isn’t quite at that level yet. But seeing this kind of performance running entirely locally? I’m more than satisfied.

I’m honestly a bit surprised that GLM 4.7 Flash is working well. On my Spark, I find token throughput pretty slow, and tool calls struggle (I should look into installing another quantization of it). For coding tasks, Qwen 30B Coder has been working really well for me.

My current setup is Antigravity for authentication, using that as the brain layer. I run a local model for the coding model as Qwen 30B Coder, while the brain model is Sonnet 4.5. This split has been giving me the best balance so far

Thanks @CopybaraS

I’m delighted to share a small accomplishment. Using GLM‑4.7‑Flash, xxx was able to extract information from a specified section of a webpage via a link, generate a table, and even sort the data—all without me having to include explicit instructions in my prompt. Originally, xxx told me it couldn’t perform this extraction, but surprisingly it delivered results better than I expected!

Wow, I’ve been thinking about trying that ā€˜hybrid’ split-model approach too! It definitely seems like the most balanced setup right now. However, I’m really pushing to see if a Full Local workflow can come close to that kind of performance.

Before GLM 4.7 Flash, Qwen Coder 30B was my go-to, but lately, I’ve found that GLM 4.7 Flash outperforms it across the board (I’m currently running a Roo Code + Local Model setup).

To share some data from my Asus GX10 (DGX Spark): I’m running GLM 4.7 Flash (q4_k_m) via Ollama, and I’m getting speeds around PP: 2000 t/s and TG: 55 t/s. While the token generation (TG) is a bit slower than Qwen, it’s still very acceptable for me. I stick with Ollama mainly because it handles memory release so well, allowing me to free up VRAM instantly for other concurrent tests on the Spark.

@CopybaraS
Have you tried nemotron-3-nano:latest? I’ve used it w/o integrating into openclaw. I was quite satisfied by it’s answer in writing, finding info w/o specific web sites, etc… I didn’t research well on its coding capability, not sure if it is because coding capability which it isn’t a recommended local model in integration with Claude Code.

@Alexander-F

Does your term ā€œsplit-modelā€ mean using multiple models in openclaw? I am thinking to try it out later. Appreciate if you like to share your config. And, are you using Skills to specify the model for that task? Thanks!

I’ve actually tried nemotron-3-nano:latest as well. I think it has some of the best logic among local models under 120B, and it was actually the first model I tested with OpenClaw.

However, its overall performance didn’t quite reach the level of GLM 4.7 Flash. I also found that Nemotron-3-Nano didn’t work particularly well when paired with Roo Code. That being said, it’s still my favorite model for daily chatting within Open WebUI.

Can assign it for certain tasks then. :-)

Hey @mmos

I am using your configuration to run the TUI. Do you happen to know how to disable the responses from showing the in their content?

I was seeing tool call XML responses until I changed my config to the above, compare yours closely and make the same changes and it should work.

I failed installing Open Claw on a old MacMini 2014 (OS to old and dependency issues. ) and also on the DGX Spark (ARM64 issues).

But I successfully installed it on a NUC10 in a Proxmox LXC Container.
It uses the LLM installed on the DGX as its ā€œbrainā€.

Tried it with GPT OSS 120b - but failed renaming images based on the content cause the model is Text only - So I use Qwen 3 VL instead.

I like this setup cause it can not mess too much with the system and even if I can just restore it from a Backup of the Container.

Is there any good (smart) ā€œAny to Anyā€ model that runs well on the Spark?

ā€œsmartā€ means coding capability? or what?