What data did this conclusion come from? Do you have the data table to show? Thank you!
GLM 4.7 Flash (Q4 KM) runs great on llama.cpp. To fix the tool-calling issue, make sure youâve got all the latest updates for both the GGUF file and llama.cpp. Setting the temperature to 0.9 and top_p to 0.95 gives the best results. Youâll also get a big speed boost by turning off thinking with --reasoning_budget 0. Sometimes, I get burst speeds of 70-75tk/s.
Just to clarify based on my experience: nemotron-3-nano:latest is definitely faster when it comes to inference speed. However, when paired with Roo Code and OpenClaw, it doesnât perform quite as well as GLM-4.7 Flash.
In my own tests, Nemotron actually showed higher âintelligenceâ (reasoning capability), so Iâve chosen to use it with Open WebUI. These are just my personal findings, thoughâdifferent parameters or prompts might yield different results. Iâd love to hear if anyone has had a different experience!
I also highly highly recommend to NOT use a local llm model when getting everything set up in OpenClaw. Spend $50 using Anthropics Opus latest model to get everything up and running. Let it get its Memory database started and let it build out most of your tools. Let it figure out its personality as well.
Then, switch over to your local LLM. I am using MiniMax 2.1 and honestly canât tell the difference. A little slower but other than that response and personality wise, pretty much the same as an enterprise model.
OpenClaw yes has its security issues but man, this is the real deal. If you have your agents working around the clock you are saving $30-$40 a day vs using a cloud provider.
The future is scary and exciting at the same time.
Iâve been toying with GLM-4.6V-FP8 and GLM-4.5-Air-FP8, serving both thru vLLM on dual Ascent GX10s. I chose these two because their size is good enough for the dual configuration and they should in principle be great for tool use.
For those interested in trying it, these two models wonât run out of the box because openclaw expects a âdeveloperâ role on the chat template, but these models seem to only have the âsystemâ role. Iâm attaching here a functional chat template that runs both models in case itâs useful.
chat_template_glm46v.zip (1.5 KB)
Same here. Before Openclaw, I have been using Nemotron once I started using this DGX Spark system, switching even from gemini 3. The thoroughness in finding answers, the smooth tune in writing, etc all made me love it.
I havenât tried it in Openclaw since GLM-4.7-Flash was recommended. Not sure what have you tried when using Nemotron in Openclaw. I suspect coding aspect is what you were not satisfied comparing with GLM-4.7-Flash.
Having to use local models has two benefits:
-
Save $ when getting experience on Openclaw.
-
Keep your data that needs more security in local environment to maximize the security assurance.
Thanks a lot for the suggestion.
I am a newbie to Openclaw. Have naive questions:
- âchat templateâ: is it referring Openclaw TUI interface? If not, how to launch this âchat templateâ?
- If âchat templateâ does mean TUI which is at âhttp://localhost:18789/chat?tokan=â, how to run your script?
Thanks a lot!
Not sure what you tried when you used Nemotron initially when you started playing with Openclaw (I guess that time was Clawdbot). Most likely only for chat functionality. And all you asked Openclaw to do were finding information, not involving doing anything.
I played a bit just now. Nemotron still behaves excellently to answer your questions thoroughly with superior intelligence. The difference between Nemotron and GLM-4.7-Flash is it doesnât DO THE WORK, it only gives you instructions. So, if Openclaw is powered by Nemotron, it acts like a manager, where GLM-4.7-Flash powered Openclaw does do the job for you. :-)
When I was testing Nemotron and other models with OpenClaw (back when it was still Clawdbot), I usually evaluated them based on these four criteria:
-
Initialization Flow: Seeing if the model can correctly follow the setup process (e.g., the instructions in
BOOTSTRAP.md). Disappointingly, I havenât found a single local model that nails this perfectly yet. (If you know one, please let me know, lol!) -
Search & Persona: Asking questions that require web searching to see the quality of the answers and check if the tone stays true to the characterâs persona.
-
Web Development: Asking it to build a simple website. I look at the UI quality and test how many features actually work when clicked.
-
Tool Use & Navigation: Having it use the browser (e.g., âGo to the Ollama website and find the three latest modelsâ). I do this because search results arenât always accurate; I want to see how well it actually handles the tools.
I donât always get through all four steps with every modelâsometimes they get so confused or messy right at the start that I just give up on them. :P
Quite thorough evaluation. The question is: In the multi-agents world, should same set of evaluation criteria applies to all models? Or should we choose a set of models that each one is an expert for some particular domains, even they are all MoE models.
Totally agree.
I see benchmarks and evaluations as tools to help us map out each modelâs strengths and expertise. Thatâs exactly why I still stick with Nemotron-3-Nano in Open WebUIâitâs still my personal favorite for what it does.
Regarding OpenClaw, my next step is to follow Alexander-Fâs approach: assigning specific tasks to different agents and using a mix of specialized models to approach the performance of a much larger model. (I think fully matching one is still a stretch, but we can get close!)
So I ran cyankiwi/MiniMax-M2.1-AWQ-4bit on my 2x dgx spark via vllm.
Over the past week Iâve done like 6-7 Openclaw installs via every method imaginable - from script, docker, source, and every possible hack, that included fighting with a broken systemd service, running via pm2 and a bunch of other nonsense.
First 1-2 days? Honestly kinda lots of fun despite nothing working out of the box ( as expected of course ). New stuff, new experience and its a tiny lobster living in my Jetson Nano - FUN!
But after another 5 days? minimax m2.1 is absolutely unusable. Itâs terrible. Slow. Rude. Pointless. Five out of six prompts miss the point, and the answersâread like a blend of nonsense and cracked ego/persona.. Just doesnât work or does things wrong. (wich isnât a case with crush or opencode).
Tomorrow Iâm nuking All Openclawâs from my machines and Iâm not coming back.
Pretty sure itâll disappear soon like it never existed once everyone gets tired/exhausted of it.
And people getting tired fast as what I can see.
I meant the chat template youâd need to use when running the LLM backend (vLLM in this case). If vLLM uses the default chat template of the model, then OpenClaw wonât play well with GLM4.5 or 4.6, hence using my chat template.
Some other people asked about it and I put together a simple-ish walkthru on how to make openclaw and vllm work together with GLM4.5/4.6 â maybe this is helpful. It contains the same chat template I uploaded here â GitHub - fidecastro/fix_glm46v: A fix for OpenClaw to work with GLM4.5 and GLM4.6V
Thank you @fidecastro
Openclaw is working nice with models like opus 4.5 for about an hour then all your tokens are used up. But with local LLMs like Qwen3 - feels quite stupid, turning in circles not getting anything done.
Qwen 3 Coder Next works amazing.
sometimes ago, I used qwen coder next, didnât notice much difference between glm 4.7 flash. earlier today morning, I asked OpenClaw to switch to qwen coder, but it loaded qwen-2.5-coder instead of qwen3, acted really stupid. Asked me to show the code before it can suggest any bug fix.