Why the models response super slowly?

I am using nvidia nim for the last few days, facing super slow response from the api calls that. Any other dev facing this issue or only me?

yeah if youre sending high token prompts it keep getting 502 or 504 but if you send one liner like hi or who are you its actually fast even with models like kimi

so reduce tour prompt count i guess man

Yes. It took 3 min to answer a simple question.

even for small models it is super slow.
for a 70b parameters model as well it takes a minute to respond to ‘hi’. Forget about using deepseek v4.
tried gemma4 as well - same response times.

NIM free tier looks very good on paper. JUST ON PAPER.

I can only name one thing that massively influences the performance: open claw