Dear NVIDIA and forum members,
We run many algorithms and fully use two Tesla T4.
I received an issue that performance drops after 2 hours.
I think that performance drop is caused by thermal mitigation.
So I check the temperature information.
$ nvidia-smi -q -d temperature
==============NVSMI LOG==============
Timestamp : Mon Mar 14 11:44:26 2022
Driver Version : 450.102.04
CUDA Version : 11.0
Attached GPUs : 2
GPU 00000000:18:00.0
Temperature
GPU Current Temp : 69 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
GPU 00000000:3B:00.0
Temperature
GPU Current Temp : 66 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 85 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
But, GPU Slowdown Temp is pretty high, so I don’t think Tesla T4 reaches that temperature.
So, I want to know when the Tesla T4 performance drop. (ie. when GPU temperature reach GPU Slowdown Temp)
And I hope you to give me the information related to this problem. (ie. voltage-frequency management policy)
Best Regards,
Andre