Hi Nvidia Team,
I am trying to use Qwen3 a32 80b instruct model and deepkseek v3.1 terminus and the inference speed is very slow. it is taking 30 sec to start a reply. sometimes it is crashing and saying error to connect. can you look into this.
Hi Nvidia Team,
I am trying to use Qwen3 a32 80b instruct model and deepkseek v3.1 terminus and the inference speed is very slow. it is taking 30 sec to start a reply. sometimes it is crashing and saying error to connect. can you look into this.
Hi radhaekrishna1433,
Could you please post logs of the reply speed and error to connect messages?
I will attempt to elevate to the Qwen team.
Thanks,
AHarpster