Well. Just got Mistral Small 4 running on my Spark ready to take for a test ride over the weekend, the next model pops up. š NVIDIA seems to be pretty serious about shipping more useful models and improving them, because⦠the more you buy, the more you save! š
The comparison chart makes me curious - claiming to be even a better coder than Qwen3.5-397B-A17B, Kimi-K2.5-1T.
Would love to hear feedback on real life use cases from you.
(EDIT) And I will also test it on my own, of course.
I managed to run it following the configuration of Nemotron and other Nemotron relatives, but it exploited high contexts and gave me 31t/s, so I havenāt been able to see its usefulness in the real world.
Damn I forgot to set the reason parser on my first attempts. š¤¦āāļø Thanks for posting your recipe.
May be NVIDIA needs to pimp opencode and make a āNemoCodeā first⦠š
I will do some testing today after being disappointed by Mistral yesterday. I also tried Mistrals Vibe Cli⦠didnāt convice me either. Used that after seeing template issues (see other post).
The one in the stelterlab repo was made by me using a recipe for nemo I used for the previous nemo. AWQ delivers around 70 t/s on a single spark. I will do more test today with opencode. And make a full bench with llama-benchy.
I spun up this model in an OpenCode project, and asked it to list all the Javascript files in a folder.
My main model at the moment, MiniMax M2.5, happily reported all the files it found including all those in subfolders. This model, however, seemed to stumble at the starting point and reported that it couldnāt find any files and didnāt bother to look in subfolders.
Not an auspicious start as a demonstration of its reasoning abilities.
It must be good for something, but certainly not for coding.
Or I just found a new challenge for coding models. I stumbled upon FastMCP a while ago. And actually I planned to take a deeper look into that this weekend⦠so I thought combine both would be good idea.
I used opencode and VS Code Insider, just to see if opencode is just bad in combo with that model. Both agents had access to context7. My test mission:
Step 1
Create a new python application. Use uv for package management and use Python 3.12. Create a .venv with uv and install dependencies with uv.
The python application shall use the FastMCP library and serve a tool that returns the current date and time via MCP.
Step 2
Create a simple client to test the server.
End of mission.
What shall I say. It failed badly. It used context7 for getting the info on how to use FastMCP, but Iām not sure if Nemo understood what it got.
I tried first my AWQ quant and then the fully blown model which was even worse in using the toolsā¦
I also tried good old gpt-oss-120b, because it was already running on another system. And it succeeded in one shot. Nemo struggled even at the regular python version on my mac, so I gave more hints in the first prompt⦠but it failed again. May be its better with more precise instructions⦠but it is not my model.
Well. I highly doubt that the āextraā world knowledge also included is necessary for coding. I still think a bunch of smaller more specialized models working together would be much better.
I also would like to see some smaller (<= 35B) fine tuned models using CoderForge dataset which was made open source by together.ai.
BTW I just tried Qwen 3.5 35B out of curiosity for that problem which did it also in one shot. This model was named as a good replacement for gpt-oss-120b.
You can use nemotron-3-nano or nemotron-3-super mod.
But Iām observing some strange behavior from Nemotron 3 Super as well, I wonder if something is broken in vLLM again. It passes tests, because it can give coherent responses, but the responses are still dumb across the entire Nemotron line.