When I installed my Tesla P100-PCIE-16GB GPU, everything seemed to be working fine initially. However, after about 2 hours of use, the card stopped functioning. When I run the nvidia-smi command, I get the following error:
desktop:~$ nvidia-smi
Unable to determine the device handle for GPU0000:01:00.0: Unknown Error
I’m concerned that the temperature of 84°C might be causing the issue. Is it normal for my GPU to restart or stop working at this temperature? Should I be worried about the 84°C temperature on my GPU, or could there be other factors contributing to this failure?
Thanks for your reply @MarkusHoHo ! I’ve already purchased a cooling system, which I should receive in a week. I’ll let you know how it performs once it arrives.