Has anyone had any good experience running on DGX Spark with clawdbot?
I started to try it out on a local throw-away sandbox VM and using Minimax m2.1 on dual spark. I got it up and running but I didnāt too far with it, there seems like a lot of tweaking would be necessary (maybe it works better with Opus 4.5 - not sure).
Here is the clawdbot config that will save you a ton of time if you try it with Minimax - took me hours to figure out why I was getting tool call XML back as responses until I landed on this:
āmodelsā: {
āmodeā: āmergeā,
āprovidersā: {
ālocalā: {
ābaseUrlā: āhttp://[server]:8443/v1ā,
āapiKeyā: ā[key]ā,
āapiā: āopenai-completionsā,
āmodelsā: [
{
āidā: āminimax-m2.1ā,
ānameā: āMiniMax M2.1ā,
āreasoningā: true,
āinputā: [
ātextā
],
ācostā: {
āinputā: 0,
āoutputā: 0,
ācacheReadā: 0,
ācacheWriteā: 0
},
ācontextWindowā: 131072,
āmaxTokensā: 8192
}
]
}
}
},
āagentsā: {
ādefaultsā: {
āmodelā: {
āprimaryā: ālocal/minimax-m2.1ā
},
āmodelsā: {
ālocal/minimax-m2.1ā: {
āaliasā: āMinimaxā
}
},
āworkspaceā: ā/home/agentuser/clawdā,
ācompactionā: {
āmodeā: āsafeguardā
},
āmaxConcurrentā: 4,
āsubagentsā: {
āmaxConcurrentā: 8
}
}
},
hey i appreciate this! was wanting to put something together this weekend so this definitely kickstarts that. thank you!
I set it up and running for bear minimum functionality. Since I have this hardware, I focused on setting up using local LLM. The local LLM part works great so far! MiniMax-m2.1 is a cloud version. You would need to pay API fee even your usage doesnāt require a fancy model.
I was able to get it working with MiniMax m2.1 on DGX spark using the config above, but it does require a cluster of 2 using eugrās vllm docker.
Iām also curious if anyone has done testing , Iāve been running the 4.7 flash - since thats what some generic research with claude suggested to do.
But compared to just running it with opus 4.5 you have to tame verbosity and issues with toolcalling. Its not optimal yet and the full āofflineā experience is definately not yet comparable.
Iād be curious to hear suggestions or experience reports from others! Iāll continue to work on it for now.
Since I only have one DGX Spark, Iām using GLM 4.7 Flash Q4. Honestly, Iām quite impressed. I spent half a day letting it self-develop and add its own features:
- Google Calendar management
- System screenshots with automatic Slack delivery
What blew me away was that I didnāt have to intervene at all during the process (except for a one-time Google API authentication). Iām thrilled to see this kind of performance from a local model. One thing to note is that the permissions are quite excessive; it definitely needs a dedicated account and an isolated VM to keep my data secure.
Iām curious to knowāwhat other models have you all been using to get great results?
Have you turned off hibernation before getting above successful story?
Yeah, hibernation is definitely disabledāon both the DGX Spark and the VM running OpenClaw. Stability is priority one for running agents.
Just to clarify so thereās no misunderstanding: it wasnāt a single-shot autonomous run. It actually took several rounds of prompting/dialogue to get there. However, the fact that I didnāt have to manually write a single line of code or tweak the tool integration myself was what really blew me away.
To be honest, compared to cloud-based SOTA models like Kimi k2.5, GLM 4.7 Flashās reasoning isnāt quite at that level yet. But seeing this kind of performance running entirely locally? Iām more than satisfied.
Iām honestly a bit surprised that GLM 4.7 Flash is working well. On my Spark, I find token throughput pretty slow, and tool calls struggle (I should look into installing another quantization of it). For coding tasks, Qwen 30B Coder has been working really well for me.
My current setup is Antigravity for authentication, using that as the brain layer. I run a local model for the coding model as Qwen 30B Coder, while the brain model is Sonnet 4.5. This split has been giving me the best balance so far
Thanks @CopybaraS
Iām delighted to share a small accomplishment. Using GLMā4.7āFlash, xxx was able to extract information from a specified section of a webpage via a link, generate a table, and even sort the dataāall without me having to include explicit instructions in my prompt. Originally, xxx told me it couldnāt perform this extraction, but surprisingly it delivered results better than I expected!
Wow, Iāve been thinking about trying that āhybridā split-model approach too! It definitely seems like the most balanced setup right now. However, Iām really pushing to see if a Full Local workflow can come close to that kind of performance.
Before GLM 4.7 Flash, Qwen Coder 30B was my go-to, but lately, Iāve found that GLM 4.7 Flash outperforms it across the board (Iām currently running a Roo Code + Local Model setup).
To share some data from my Asus GX10 (DGX Spark): Iām running GLM 4.7 Flash (q4_k_m) via Ollama, and Iām getting speeds around PP: 2000 t/s and TG: 55 t/s. While the token generation (TG) is a bit slower than Qwen, itās still very acceptable for me. I stick with Ollama mainly because it handles memory release so well, allowing me to free up VRAM instantly for other concurrent tests on the Spark.
@CopybaraS
Have you tried nemotron-3-nano:latest? Iāve used it w/o integrating into openclaw. I was quite satisfied by itās answer in writing, finding info w/o specific web sites, etc⦠I didnāt research well on its coding capability, not sure if it is because coding capability which it isnāt a recommended local model in integration with Claude Code.
Does your term āsplit-modelā mean using multiple models in openclaw? I am thinking to try it out later. Appreciate if you like to share your config. And, are you using Skills to specify the model for that task? Thanks!
Iāve actually tried nemotron-3-nano:latest as well. I think it has some of the best logic among local models under 120B, and it was actually the first model I tested with OpenClaw.
However, its overall performance didnāt quite reach the level of GLM 4.7 Flash. I also found that Nemotron-3-Nano didnāt work particularly well when paired with Roo Code. That being said, itās still my favorite model for daily chatting within Open WebUI.
Can assign it for certain tasks then. :-)
Hey @mmos
I am using your configuration to run the TUI. Do you happen to know how to disable the responses from showing the in their content?
I was seeing tool call XML responses until I changed my config to the above, compare yours closely and make the same changes and it should work.
I failed installing Open Claw on a old MacMini 2014 (OS to old and dependency issues. ) and also on the DGX Spark (ARM64 issues).
But I successfully installed it on a NUC10 in a Proxmox LXC Container.
It uses the LLM installed on the DGX as its ābrainā.
Tried it with GPT OSS 120b - but failed renaming images based on the content cause the model is Text only - So I use Qwen 3 VL instead.
I like this setup cause it can not mess too much with the system and even if I can just restore it from a Backup of the Container.
Is there any good (smart) āAny to Anyā model that runs well on the Spark?
āsmartā means coding capability? or what?