Experiencing 504 Gateway Timeout & Slower Inference Speed with LLaMA3.3 70B on NIM API

I’ve been using the NVIDIA NIM API to run inference with the LLaMA 3.3 70B-Instruct model.

Recently, I have encountered repeated issues where the API returns:
Error code: 504 (Gateway Timeout)

This has happened multiple times over the past three days, and in addition, I have noticed that the generation speed is significantly slower than usual.

Has there been any update, server-side changes, or system load issues on the NIM platform recently?

Any help or clarification would be greatly appreciated. Thank you!

Hi @orz0310tw, Thanks for bringing this to our attention!

We have identified some issues with this model and have made updates to fix.

Please can you confirm that the API Calls are now working as expected?

Best,

Sophie