Request for NVIDIA NIM API rate limit increase (40 RPM → 200 RPM)

Hello NVIDIA team,

I am currently integrating NVIDIA NIM APIs from build.nvidia.com into an agent-based workflow environment for development and evaluation purposes.

My setup uses a hybrid inference architecture combining:

  • local GPU inference (RTX 3060)
  • NVIDIA cloud NIM APIs for higher-capability reasoning models
  • tool-calling and multi-step agent pipelines (Hermes-style workflow)

During testing, the agent framework frequently performs:

  • parallel tool calls
  • multi-step reasoning loops
  • RAG-style retrieval evaluation
  • prompt iteration and response comparison

Because of this workflow structure, the default 40 RPM limit is reached very quickly during normal experimentation.

This environment is currently used for:

  • agent pipeline testing
  • hybrid inference orchestration validation
  • structured prompting evaluation
  • model capability benchmarking

The API is not yet used for production deployment, but it is part of an active development workflow.

I would greatly appreciate if the rate limit could be increased to around 200 RPM to support this development setup.

Thank you very much for supporting developers working with NVIDIA NIM.