I’ve been using the NVIDIA NIM API to run inference with the LLaMA 3.3 70B-Instruct model.
Recently, I have encountered repeated issues where the API returns:
Error code: 504 (Gateway Timeout)
This has happened multiple times over the past three days, and in addition, I have noticed that the generation speed is significantly slower than usual.
Has there been any update, server-side changes, or system load issues on the NIM platform recently?
Any help or clarification would be greatly appreciated. Thank you!