Open-Source CLI Agent Framework for NVIDIA AI Endpoints - Seeking Feedback

Hi everyone,

I built an open-source Python CLI that turns NVIDIA’s AI endpoints into an agentic coding assistant. Think Claude Code or Cursor, but running against models on build.nvidia.com.

What it does:

  • ReAct agent loop (reason, pick tool, execute, observe, repeat)

  • Reads/writes files, runs shell commands, searches your codebase

  • Persistent memory across sessions (SQLite + vector/BM25 hybrid search)

  • File-based agent identity system inspired by OpenClaw

  • Installable skills with security scanning

  • Works with any model on NVIDIA’s OpenAI-compatible API

Tested with: Nemotron Nano 12B, Llama 3.1 70B/8B, DeepSeek v3.2

pip install -e .
export NVIDIA_API_KEY="nvapi-..."
nv chat
nv> /init        # analyzes your codebase
nv> /model llama70
nv> Fix the bug in auth.py

Looking for:

  1. Which NVIDIA-hosted models work best for tool-use / agentic coding?

  2. Feedback on the architecture from anyone building similar things

  3. Any interest from the NVIDIA team if this aligns with NIM or developer tooling efforts

GitHub: https://github.com/SingularityAI-Dev/Nvidia-CLI
MIT licensed, Python 3.9+. PRs welcome.

HI @rain.singlesource , Welcome to the Community !

Allow me some time to see and review it. While keeping this open for the community for their feedback.

Excellent Work !

Hi @athkumar

Thank you for responding, I really appreciate it, however I have shortly after realised you guys built nemoclaw lol, so it feels a little redundant now.

But it’s still a CLI tool, if nothing comes from it, I am more than pleased I got that far, and happy to continue pushing, and building using Nvidia tools, I really love and enjoy working in the Nvidia environment, I learnt a lot from the community.

So all said, thanks.

Quick documentation update — v7.0 now has a proper visual architecture breakdown

Hey all, just pushed a significant documentation update to the repo if you want a clearer picture of how everything fits together.

The README now includes:

  • Animated architecture diagram showing the full data flow — from CLI entry points through the ReActAgent loop, Soul middleware, Hybrid Memory, Tools, Skills scanner, and out to the NVIDIA NIM API endpoints (including which models sit where)

  • Animated terminal demos for each major feature — multi-agent orchestration, Soul/Identity loading, the Skills security scanner, Heartbeat scheduler, and the /init codebase analysis flow

All visuals are native SVGs so they render and animate directly in GitHub without needing to click anywhere.

Repo: https://github.com/SingularityAI-Dev/Nvidia-CLI

Still very much open to feedback — particularly around the ReActAgent loop design and whether the hybrid memory approach (vector + BM25) is something others have found useful in their own agentic setups.

nvidia-cli-architecture