Is it normal for my Tesla P100-PCIE-16GB GPU to restart at 84°C?


When I installed my Tesla P100-PCIE-16GB GPU, everything seemed to be working fine initially. However, after about 2 hours of use, the card stopped functioning. When I run the nvidia-smi command, I get the following error:

desktop:~$ nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

I’m concerned that the temperature of 84°C might be causing the issue. Is it normal for my GPU to restart or stop working at this temperature? Should I be worried about the 84°C temperature on my GPU, or could there be other factors contributing to this failure?

Hi @snippetdeveloper and welcome to the NVIDIA developer forums.

Check out this document under “Thermal Specifications”: https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/solutions/resources/documents1/NV-tesla-p100-pcie-PB-08248-001-v01.pdf

I would say yes, 84C is not a good situation for the P100.

Any chance to re-paste or improve cooling?

Thanks!

Thanks for your reply @MarkusHoHo ! I’ve already purchased a cooling system, which I should receive in a week. I’ll let you know how it performs once it arrives.

Thanks, @MarkusHoHo! That was the problem, installing the fans. Now I have 38°C, and I think this temperature is normal. Or do you suggest lower?

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.