Hi NVIDIA Team,
I am currently developing a personal multi-agent AI workflow system for research and learning purposes.
Current rate limit:
40 RPM
Requested:
200 RPM
Use case:
- Multi-agent orchestration
- RAG workflows
- Memory compression
- Tool calling
- Planning agents
- Summarization pipelines
The current 40 RPM limit is difficult for concurrent agent workflows because a single user request may trigger multiple lightweight inference calls.
I have already implemented:
- local rate limiting
- request queue
- concurrency control
- caching
- summarization compression
This is for personal development and experimentation only, not for public commercial deployment.
My account E-mail:478347388@qq.com
Thank you for your consideration.