Dear NVIDIA NIM Support Team,
I am writing to formally request a rate limit increase for my NVIDIA NIM API account, currently capped at 40 requests per minute (RPM). I respectfully request an increase to 200 RPM to support my ongoing personal and educational projects.
Current Limit: 40 RPM
Requested Limit: 200 RPM
Background
I am a student and independent developer with a growing interest in AI-integrated systems and software development. I am currently running three distinct workloads on top of NVIDIA NIM as my primary inference backend, all of which have outgrown the constraints of the current rate limit.
Project 1: Hermes Agent — Self-Hosted Personal AI Assistant
I am running a self-hosted instance of Hermes Agent (by Nous Research) on a dedicated Debian server, with NVIDIA NIM serving as the primary inference backend via the nvidia/nemotron-3-super-120b-a12b model. The agent is integrated with Telegram for real-time interaction and Discord for automated task delivery.
The system performs a wide range of agentic tasks including web research, document analysis, file operations, and scheduled cron jobs. A single complex multi-step task — such as a research pipeline or a document summarization workflow — can generate 30–80 sequential and parallel API calls. The agent runs continuously in the background, and the 40 RPM limit introduces artificial latency and causes frequent rate limit errors during peak usage.
Estimated daily API call volume at 200 RPM: 3,000 – 5,000 requests.
Primary model in use: nvidia/nemotron-3-super-120b-a12b.
Project 2: OpenClaw — Multi-Channel AI Gateway
In parallel, I am running OpenClaw as a multi-channel AI gateway, also backed by NVIDIA NIM. This instance handles real-time messaging across multiple chat platforms and executes skill-based workflows. Combined with the Hermes Agent workload, the two systems share the same API key and collectively generate concurrent API traffic that the current 40 RPM cap cannot reliably sustain.
Project 3: Educational Programming Assistant
As a student actively learning software development, I use NVIDIA NIM-backed tooling to assist with understanding clean code principles, reviewing my own code, and getting explanations of programming concepts in real time. This includes iterative back-and-forth inference sessions where code snippets are submitted, analyzed, and refined across multiple turns — a pattern that generates a higher-than-average number of API calls per learning session.
Summary
Across all three workloads, I am experiencing a sustained and growing demand for reliable, low-latency inference. NVIDIA NIM was chosen as the backbone of my setup due to the exceptional quality of its models, the OpenAI-compatible API design, and the performance of its TensorRT-optimized inference stack.
I kindly request that my account be upgraded to 200 RPM at the earliest opportunity. I am happy to provide any additional technical details or usage information upon request.
Thank you sincerely for your time and consideration.
Respectfully,
Marcell Forró