Request for NVIDIA NIM API Rate Limit Increase (40 → 200 RPM)

Hello NVIDIA Support Team,

I am writing to request a rate limit increase for my individual developer account. I am hitting severe 429 bottlenecks while testing and prototyping my applications.

Account Email: tanishshivhare2 (at) gmail.com

Current Limit: 40 RPM

Requested Limit: 200 RPM

Primary Models Used: Llama 3 70B, Nemotron 4, DeepSeek, and GLM 5.1

Development Use Case:
I am building a local development application involving a multi-agent coding assistant and a RAG system for document processing. Because my architecture uses chained and sequential API calls per user action, a single testing loop instantly exhausts the default 40 RPM sandbox threshold.

A bump to 200 RPM will allow me to test multi-step user workflows and properly evaluate performance before transitioning to self-hosted infrastructure. I have already implemented client-side exponential backoff to handle traffic responsibly.
Thank you for your review and support!

Hey @tanishshivhare2, I don’t know if this helps, but a moderator previously replied with this to someone asking for the same thing as you:“Many of you are using free tier API access to NVIDIA NIMs. This usually involves a rate limit that is dependent on model, use-case and the amount of current overall traffic using the same access. There is no official way to circumvent this rate limit or to receive a rate limit increase on that same tier. And specifically here on the forums we do not have any influence on those rate limits. To make full use of a NIM blueprint you will need to deploy it. For more details on NVIDIA NIM refer to…”

Basically, this means that if you are using the free-tier API, you have absolutely no right to request an RPM increase in free-tier. The only way to legitimately request higher RPM limits is:

Not through the forum. Moderators have said it over and over again: the forum is not the place to request RPM increases, as there are other channels and processes for that.

You need to pay and deploy a model through NVIDIA NIM / NVIDIA Build if you require higher usage limits and production-level access.

First, go to the model you prefer, for example DeepSeek V4 Flash. In that section you’ll see three options: Experience / Model Card / Deploy.

Click on Deploy and you’ll see several options such as Partner Endpoints or Self-hosted Deployments.

From there, you choose the option you want, and it will show you the pricing and deployment costs.

You can start with DeepSeek Flash since it’s the cheapest one, so you can learn how the process works first. And if you need help, you can always contact a moderator privately and ask for guidance. There’s no problem with sending a direct message saying you want to pay and deploy properly.

I am not saying this to be toxic, rude, or disrespectful. My goal is simply to help you understand and follow NVIDIA’s rules and the guidance that has already been provided by moderators multiple times.

IF YOU ARE ALREADY PAYING FOR NVIDIA SERVICES AND DEPLOYED MODELS, THEN YOU SHOULD CONTACT NVIDIA DIRECTLY THROUGH THE APPROPRIATE SUPPORT CHANNELS, SUCH AS EMAIL OR PRIVATE COMMUNICATION WITH THE RELEVANT SUPPORT TEAM, RATHER THAN MAKING RPM INCREASE REQUESTS ON THE FORUM.

Actually the issues is that , when I am using the nvidia NIM api (because I am a student and I don’t have much resources or money to deploy or use paid tier ) it continuously shows ERROR 429 (TO many requests) or resources exausted . even if I use it after 2 min or 10 min . Also after that . If I pause the execution and restart with a simple greeting (like "Hello’) to check that it works or not .at that time it still shows same error (even this is a very simple request) .so please help me 🥺

Unfortunately, there is no solution. You just need to be patient and avoid sending requests so quickly, since that is what causes the problem. Yes, it is free, but just because it is free does not mean it should perform like a paid service.