With the influx of new users almost all new models became unusable, even the light and fast ones like Qwen3.5-397B-A17B. Can we expect that to resolve anytime soon? If not, can there at least be more transparency about latest per-model rate limits on each model’s page or any other way to set expectation straight?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Some models return no output via NVIDIA API (IBM Granite, others) + no API usage dashboard visible | 0 | 123 | January 25, 2026 | |
| Support new models | 1 | 82 | November 8, 2025 | |
| Model Limits | 4 | 2922 | May 25, 2025 | |
| GLM 5 is out! I'd really like to see it among the available models | 2 | 790 | February 18, 2026 | |
| New models in the free API are always overloaded. Why not lower the RPM to 5-10 queries for the most popular models? | 4 | 752 | February 14, 2026 | |
| The NIM endpoints for Llama 3.1 405B are unreliable sometimes | 3 | 263 | August 11, 2024 | |
| Add Qwen3 235B 2507 & Kimi 0905 | 3 | 165 | November 9, 2025 | |
| Google/gemma-3-27b-it is Very slow | 1 | 559 | May 1, 2025 | |
| 404 Error - Function Not Found for Some Models + How to List Supported Models via API | 3 | 267 | February 9, 2026 | |
| Inferencing models from api taking very long | 1 | 138 | December 19, 2025 |