Models are very very slow

radhaekrishna1433 · November 9, 2025, 8:33pm

Hi Nvidia Team,

I am trying to use Qwen3 a32 80b instruct model and deepkseek v3.1 terminus and the inference speed is very slow. it is taking 30 sec to start a reply. sometimes it is crashing and saying error to connect. can you look into this.

Aharpster · November 12, 2025, 4:59pm

Hi radhaekrishna1433,

Could you please post logs of the reply speed and error to connect messages?

I will attempt to elevate to the Qwen team.

Thanks,

AHarpster

Topic		Replies	Views
NIM HTTP API Inference (Run Anywhere) Taking Extremely Long! Models nim , llama-31-70b-instruct , llama-31-405b-instruct , llama	1	362	September 11, 2024
Yolov3 slow inference TAO Toolkit	4	673	October 12, 2021
High latency while run TensorFlow with keras on Jetson Tx2 Jetson TX2	5	1676	October 18, 2021
What's the expected performance of Python test examples? DeepStream SDK	20	1989	October 12, 2021
Model ran much slower in deepstream pipeline DeepStream SDK	2	779	October 12, 2021
Inference Time is not stable TensorRT	10	1848	January 3, 2019
tensorflow mobilenet object detection model in Tx2 is very slow? Jetson TX2	11	4073	October 18, 2021
DeepStream SDK is slow to start Deep Learning (Training & Inference)	1	388	May 20, 2020
What is the inference speed? TAO Toolkit	3	722	December 21, 2021
Loading Pre-Trained Models to AIAA Deep Learning (Training & Inference)	0	366	October 7, 2020

Models are very very slow

Related topics