To check for PSU issues, you can limit clocks to avoid power spikes due to gpu boost
nvidia-smi -lgc 300,1500
Apart from gpu temperature, there’s also the memory temperature which unfortunately can’t be read on Linux. So while 85°C isn’t great but ok for the gpu the memory might still be overheating.
https://forums.developer.nvidia.com/t/request-gpu-memory-junction-temperature-via-nvidia-smi-or-nvml-api/168346
1 Like