Models are very very slow

Hi Nvidia Team,

I am trying to use Qwen3 a32 80b instruct model and deepkseek v3.1 terminus and the inference speed is very slow. it is taking 30 sec to start a reply. sometimes it is crashing and saying error to connect. can you look into this.

1 Like

Hi radhaekrishna1433,

Could you please post logs of the reply speed and error to connect messages?

I will attempt to elevate to the Qwen team.

Thanks,

AHarpster