Hi everyone,
I built an open-source Python CLI that turns NVIDIA’s AI endpoints into an agentic coding assistant. Think Claude Code or Cursor, but running against models on build.nvidia.com.
What it does:
-
ReAct agent loop (reason, pick tool, execute, observe, repeat)
-
Reads/writes files, runs shell commands, searches your codebase
-
Persistent memory across sessions (SQLite + vector/BM25 hybrid search)
-
File-based agent identity system inspired by OpenClaw
-
Installable skills with security scanning
-
Works with any model on NVIDIA’s OpenAI-compatible API
Tested with: Nemotron Nano 12B, Llama 3.1 70B/8B, DeepSeek v3.2
pip install -e .
export NVIDIA_API_KEY="nvapi-..."
nv chat
nv> /init # analyzes your codebase
nv> /model llama70
nv> Fix the bug in auth.py
Looking for:
-
Which NVIDIA-hosted models work best for tool-use / agentic coding?
-
Feedback on the architecture from anyone building similar things
-
Any interest from the NVIDIA team if this aligns with NIM or developer tooling efforts
GitHub: https://github.com/SingularityAI-Dev/Nvidia-CLI
MIT licensed, Python 3.9+. PRs welcome.
HI @rain.singlesource , Welcome to the Community !
Allow me some time to see and review it. While keeping this open for the community for their feedback.
Excellent Work !
Hi @athkumar
Thank you for responding, I really appreciate it, however I have shortly after realised you guys built nemoclaw lol, so it feels a little redundant now.
But it’s still a CLI tool, if nothing comes from it, I am more than pleased I got that far, and happy to continue pushing, and building using Nvidia tools, I really love and enjoy working in the Nvidia environment, I learnt a lot from the community.
So all said, thanks.
Quick documentation update — v7.0 now has a proper visual architecture breakdown
Hey all, just pushed a significant documentation update to the repo if you want a clearer picture of how everything fits together.
The README now includes:
-
Animated architecture diagram showing the full data flow — from CLI entry points through the ReActAgent loop, Soul middleware, Hybrid Memory, Tools, Skills scanner, and out to the NVIDIA NIM API endpoints (including which models sit where)
-
Animated terminal demos for each major feature — multi-agent orchestration, Soul/Identity loading, the Skills security scanner, Heartbeat scheduler, and the /init codebase analysis flow
All visuals are native SVGs so they render and animate directly in GitHub without needing to click anywhere.
Repo: https://github.com/SingularityAI-Dev/Nvidia-CLI
Still very much open to feedback — particularly around the ReActAgent loop design and whether the hybrid memory approach (vector + BM25) is something others have found useful in their own agentic setups.