Has anyone had any good experience running on DGX Spark with clawdbot?

What data did this conclusion come from? Do you have the data table to show? Thank you!

GLM 4.7 Flash (Q4 KM) runs great on llama.cpp. To fix the tool-calling issue, make sure you’ve got all the latest updates for both the GGUF file and llama.cpp. Setting the temperature to 0.9 and top_p to 0.95 gives the best results. You’ll also get a big speed boost by turning off thinking with --reasoning_budget 0. Sometimes, I get burst speeds of 70-75tk/s.

Just to clarify based on my experience: nemotron-3-nano:latest is definitely faster when it comes to inference speed. However, when paired with Roo Code and OpenClaw, it doesn’t perform quite as well as GLM-4.7 Flash.

In my own tests, Nemotron actually showed higher ‘intelligence’ (reasoning capability), so I’ve chosen to use it with Open WebUI. These are just my personal findings, though—different parameters or prompts might yield different results. I’d love to hear if anyone has had a different experience!

I also highly highly recommend to NOT use a local llm model when getting everything set up in OpenClaw. Spend $50 using Anthropics Opus latest model to get everything up and running. Let it get its Memory database started and let it build out most of your tools. Let it figure out its personality as well.

Then, switch over to your local LLM. I am using MiniMax 2.1 and honestly can’t tell the difference. A little slower but other than that response and personality wise, pretty much the same as an enterprise model.

OpenClaw yes has its security issues but man, this is the real deal. If you have your agents working around the clock you are saving $30-$40 a day vs using a cloud provider.

The future is scary and exciting at the same time.

1 Like

I’ve been toying with GLM-4.6V-FP8 and GLM-4.5-Air-FP8, serving both thru vLLM on dual Ascent GX10s. I chose these two because their size is good enough for the dual configuration and they should in principle be great for tool use.

For those interested in trying it, these two models won’t run out of the box because openclaw expects a “developer” role on the chat template, but these models seem to only have the “system” role. I’m attaching here a functional chat template that runs both models in case it’s useful.

chat_template_glm46v.zip (1.5 KB)

Same here. Before Openclaw, I have been using Nemotron once I started using this DGX Spark system, switching even from gemini 3. The thoroughness in finding answers, the smooth tune in writing, etc all made me love it.

I haven’t tried it in Openclaw since GLM-4.7-Flash was recommended. Not sure what have you tried when using Nemotron in Openclaw. I suspect coding aspect is what you were not satisfied comparing with GLM-4.7-Flash.

Having to use local models has two benefits:

  1. Save $ when getting experience on Openclaw.

  2. Keep your data that needs more security in local environment to maximize the security assurance.

@fidecastro

Thanks a lot for the suggestion.

I am a newbie to Openclaw. Have naive questions:

  1. “chat template”: is it referring Openclaw TUI interface? If not, how to launch this “chat template”?
  2. If “chat template” does mean TUI which is at “http://localhost:18789/chat?tokan=”, how to run your script?

Thanks a lot!

@CopybaraS

Not sure what you tried when you used Nemotron initially when you started playing with Openclaw (I guess that time was Clawdbot). Most likely only for chat functionality. And all you asked Openclaw to do were finding information, not involving doing anything.

I played a bit just now. Nemotron still behaves excellently to answer your questions thoroughly with superior intelligence. The difference between Nemotron and GLM-4.7-Flash is it doesn’t DO THE WORK, it only gives you instructions. So, if Openclaw is powered by Nemotron, it acts like a manager, where GLM-4.7-Flash powered Openclaw does do the job for you. :-)

When I was testing Nemotron and other models with OpenClaw (back when it was still Clawdbot), I usually evaluated them based on these four criteria:

  1. Initialization Flow: Seeing if the model can correctly follow the setup process (e.g., the instructions in BOOTSTRAP.md). Disappointingly, I haven’t found a single local model that nails this perfectly yet. (If you know one, please let me know, lol!)

  2. Search & Persona: Asking questions that require web searching to see the quality of the answers and check if the tone stays true to the character’s persona.

  3. Web Development: Asking it to build a simple website. I look at the UI quality and test how many features actually work when clicked.

  4. Tool Use & Navigation: Having it use the browser (e.g., “Go to the Ollama website and find the three latest models”). I do this because search results aren’t always accurate; I want to see how well it actually handles the tools.

I don’t always get through all four steps with every model—sometimes they get so confused or messy right at the start that I just give up on them. :P

Quite thorough evaluation. The question is: In the multi-agents world, should same set of evaluation criteria applies to all models? Or should we choose a set of models that each one is an expert for some particular domains, even they are all MoE models.

Totally agree.

I see benchmarks and evaluations as tools to help us map out each model’s strengths and expertise. That’s exactly why I still stick with Nemotron-3-Nano in Open WebUI—it’s still my personal favorite for what it does.

Regarding OpenClaw, my next step is to follow Alexander-F’s approach: assigning specific tasks to different agents and using a mix of specialized models to approach the performance of a much larger model. (I think fully matching one is still a stretch, but we can get close!)

So I ran cyankiwi/MiniMax-M2.1-AWQ-4bit on my 2x dgx spark via vllm.

Over the past week I’ve done like 6-7 Openclaw installs via every method imaginable - from script, docker, source, and every possible hack, that included fighting with a broken systemd service, running via pm2 and a bunch of other nonsense.

First 1-2 days? Honestly kinda lots of fun despite nothing working out of the box ( as expected of course ). New stuff, new experience and its a tiny lobster living in my Jetson Nano - FUN!

But after another 5 days? minimax m2.1 is absolutely unusable. It’s terrible. Slow. Rude. Pointless. Five out of six prompts miss the point, and the answers”read like a blend of nonsense and cracked ego/persona.. Just doesn’t work or does things wrong. (wich isn’t a case with crush or opencode).

Tomorrow I’m nuking All Openclaw’s from my machines and I’m not coming back.

Pretty sure it’ll disappear soon like it never existed once everyone gets tired/exhausted of it.
And people getting tired fast as what I can see.

I meant the chat template you’d need to use when running the LLM backend (vLLM in this case). If vLLM uses the default chat template of the model, then OpenClaw won’t play well with GLM4.5 or 4.6, hence using my chat template.

Some other people asked about it and I put together a simple-ish walkthru on how to make openclaw and vllm work together with GLM4.5/4.6 – maybe this is helpful. It contains the same chat template I uploaded here – GitHub - fidecastro/fix_glm46v: A fix for OpenClaw to work with GLM4.5 and GLM4.6V

1 Like

Thank you @fidecastro

1 Like

Openclaw is working nice with models like opus 4.5 for about an hour then all your tokens are used up. But with local LLMs like Qwen3 - feels quite stupid, turning in circles not getting anything done.

1 Like

Qwen 3 Coder Next works amazing.

sometimes ago, I used qwen coder next, didn’t notice much difference between glm 4.7 flash. earlier today morning, I asked OpenClaw to switch to qwen coder, but it loaded qwen-2.5-coder instead of qwen3, acted really stupid. Asked me to show the code before it can suggest any bug fix.