Hello NVIDIA team,
I am currently integrating NVIDIA NIM APIs from build.nvidia.com into an agent-based workflow environment for development and evaluation purposes.
My setup uses a hybrid inference architecture combining:
- local GPU inference (RTX 3060)
- NVIDIA cloud NIM APIs for higher-capability reasoning models
- tool-calling and multi-step agent pipelines (Hermes-style workflow)
During testing, the agent framework frequently performs:
- parallel tool calls
- multi-step reasoning loops
- RAG-style retrieval evaluation
- prompt iteration and response comparison
Because of this workflow structure, the default 40 RPM limit is reached very quickly during normal experimentation.
This environment is currently used for:
- agent pipeline testing
- hybrid inference orchestration validation
- structured prompting evaluation
- model capability benchmarking
The API is not yet used for production deployment, but it is part of an active development workflow.
I would greatly appreciate if the rate limit could be increased to around 200 RPM to support this development setup.
Thank you very much for supporting developers working with NVIDIA NIM.