Experiencing 504 Gateway Timeout & Slower Inference Speed with LLaMA3.3 70B on NIM API

orz0310tw · April 20, 2025, 4:38am

I’ve been using the NVIDIA NIM API to run inference with the LLaMA 3.3 70B-Instruct model.

Recently, I have encountered repeated issues where the API returns:
Error code: 504 (Gateway Timeout)

This has happened multiple times over the past three days, and in addition, I have noticed that the generation speed is significantly slower than usual.

Has there been any update, server-side changes, or system load issues on the NIM platform recently?

Any help or clarification would be greatly appreciated. Thank you!

sophwats · April 22, 2025, 12:31pm

Hi @orz0310tw, Thanks for bringing this to our attention!

We have identified some issues with this model and have made updates to fix.

Please can you confirm that the API Calls are now working as expected?

Best,

Sophie

Topic		Replies	Views
NIM HTTP API Inference (Run Anywhere) Taking Extremely Long! Models nim , llama-31-70b-instruct , llama-31-405b-instruct , llama	1	186	September 11, 2024
The NIM endpoints for Llama 3.1 405B are unreliable sometimes Models nim , llama-31-405b-instruct , llama	3	154	August 11, 2024
NVIDIA API KEY runs out fast Access/Accounts nim , llama-31-70b-instruct	8	277	February 3, 2025
Issues while starting NIM container in A10 VM Models nim , llama3-8b-instruct	4	147	September 4, 2024
NIM API KEY Models nim	4	416	February 19, 2025
Performance data mistakes in LLAMA inference CUDA Programming and Performance tensorrt , natural-language-processing-nlp , inference-server-triton	1	409	February 7, 2024
API connect Models nim , llama-31-8b-instruct , llama	1	107	September 20, 2024
Build intelligent chatbots, enhance search engines, and develop educational tools with Llama 3-ChatQA Technical Blog	1	70	June 26, 2024
Llama-4-maverick-17b-128e-instruct not responding AI Foundation Models and Endpoints llama	2	26	April 23, 2025
NVIDIA NIM API invoked by Langchain returns statuscode 500 Access/Accounts nim , llama-31-70b-instruct , llama	1	144	September 4, 2024

Experiencing 504 Gateway Timeout & Slower Inference Speed with LLaMA3.3 70B on NIM API

Related topics