I understand we shouldn’t expect much from a free API intended for testing, but constant overload makes any use of new models impossible. If providing more computing power isn’t feasible, why not lower the RPM to 5-10? Reducing the limit to these values would have little impact on regular users, but would significantly reduce the potential for abuse of the free API.
This isn’t a complaint; I’m just curious if you plan to address the overload somehow.
It looks like Nvidia has begun efforts to reduce the RPM for high-demand models. Hopefully, this will help make these models more accessible to everyone.