Experiencing 504 Gateway Timeout & Slower Inference Speed with LLaMA3.3 70B on NIM API

I’ve been using the NVIDIA NIM API to run inference with the LLaMA 3.3 70B-Instruct model.

Recently, I have encountered repeated issues where the API returns:
Error code: 504 (Gateway Timeout)

This has happened multiple times over the past three days, and in addition, I have noticed that the generation speed is significantly slower than usual.

Has there been any update, server-side changes, or system load issues on the NIM platform recently?

Any help or clarification would be greatly appreciated. Thank you!

Hi @orz0310tw, Thanks for bringing this to our attention!

We have identified some issues with this model and have made updates to fix.

Please can you confirm that the API Calls are now working as expected?

Best,

Sophie

1 Like

i get the same error today

While using the NVIDIA NIM API with deepseek-ai/deepseek-r1, I’ve repeatedly received 504 Gateway Timeout errors in the last three days. I’ve also observed that the model’s generation speed has slowed down considerably.

1 Like

Hi @zakarialogarithm1 - I am unable to recreate the error on my side.

Please let me know if you are still having issues with the deepseek-ai NIM API and/or the Llama-3.3-70b-instruct NIM API and I will get our NIM team to look into it.

Best,

Sophie

Hi, I was able to recreate the error.

Thanks for the screenshot - I’m seeing the error on my end too now. I’ve shared with the team to get them to look into a fix.

Best,

Sophie

The team have redeployed the model and it is now working as expected. Thanks for bringing the issue to our attention.

best,

Sophie

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.