Rate Limit Increase Request: Developer Research & Model Fine-Tuning

Hello NVIDIA API Support Team,

I am an individual developer working on a personal research project involving large-scale instruction/response dataset generation for a specific domain-expert fine-tuning pipeline. I am requesting a rate limit increase to help complete a one-time generation phase for my local model development.

Project Context: I am building a high-precision domain-specific dataset (~100,000 entries) to fine-tune a local 7B parameter model. My pipeline is currently running on moonshotai/kimi-k2-thinking, which is excellent for this task but high in latency.

Technical Hurdle: Because the Kimi-K2 model has long internal “thinking” cycles, I need to run high concurrency (20 workers) to maintain a reasonable generation timeline. However, the 40 RPM ceiling on the developer tier is causing frequent 429 errors which are significantly extending the generation runtime.

Request Details:

This is a personal, non-commercial project focused on exploring the limits of specialized dataset generation. Increasing the limit would allow me to finish this specific data generation phase without 429-related interruptions.

Thank you very much for your time and for supporting individual developers on your platform.

Best regards, Joseph Saghbini.

To assist with this request, my NVIDIA Cloud Account ID is 0928046057701069. This is for the moonshotai/kimi-k2-thinking model on the https://integrate.api.nvidia.com/v1 endpoint.

Hi @TomNVIDIA and @sophwats,

Following up on my request (posted ~48 hours ago) for a temporary RPM increase to 200 RPM for the moonshotai/kimi-k2-thinking model.

I am currently hitting consistent 429 errors at the 40 RPM limit, which is stalling my research dataset generation (~100k entries).

Account Details for Review:

  • NVIDIA Cloud Account ID: 0928046057701069

  • Email: jsaghbini@outlook.com

  • Context: High-concurrency (20 workers) needed to offset the long thinking cycles of this specific model.

  • Duration: This is a one-time burst for a personal research project.

Thank you for your help in supporting individual developers on the platform!

Hi @Aharpster,

Can you please chime in here?

Hi @Aharpster, thank you for looking into this,

To clarify the request: I am an individual developer working on a personal research project (dataset generation). Because the kimi-k2-thinking model has high-latency reasoning cycles, the standard 40 RPM limit is causing constant 429 errors even with low throughput.

Account Details:

  • NVIDIA Cloud Account ID: 0928046057701069

  • Email: jsaghbini@outlook.com

  • Target: Increase from 40 RPM to 200 RPM for a one-time burst generation (~100k entries).

I have exponential backoff logic ready to go to ensure I stay within the new limit. Happy to provide any other details you need!