I have Precision 7920 Tower Workstation. In which I have GPU NVIDIA RTX A6000. While training the Network the temperature of GPU varies from more than 80’C.
What is maximum/suitable temperature for NVIDIA RTX A6000.
Regards
Welcome to the NVIDIA Developer forums! This is the NVAPI category, your question should be posted in the GPU Hardware section of the forums. I will go ahead and move it over for you.
Optimal temp is “lower” ;-).
Turboboost kicks in dynamically based on actual temp, so the cooler, the more likely to gain a few more Hz for a few more (m)seconds…
I would have to look up the exact throttling and shutdown temps for A6000, but they are usually around 90°…
Running under load, at 80° is absolutely fine!
As long as you run a certified system in the specified environment, system and GPU will make sure to never overheat. Only if airflow to GPU is blocked, (cables), or system is run in too hot an environment (uncooled enclosure, A/C failure of the (server) room), the too high temps would cause early throttling, then shutdown, so system would not reach its nominal perf…
I do share the same concern that started this thread and elsewhere in this forum I do find some concerns expressed for lifetime of the GPUs. So, I was trying to find what would be the allowable limits for my GPU i.e. NVIDIA RTX PRO 6000 Blackwell Workstation Edition into a Dell Precision. And the temperature section of nvidia-smi -q shows this output:
Temperature
GPU Current Temp : 77 C
GPU T.Limit Temp : 15 C
GPU Shutdown T.Limit Temp : -5 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : N/A
which is kind of weird! Perhaps some driver issue ? Here is my driver version from the same command.
Hi, we did change some of out temp logic and temp sensor readings a while back, might have forgotten to make the nvidia-smi output friendly for endusers to understand…
In essence, we introduced T(limit) as the distance to the temp, a SW slowdown will kick in, so 15° C in your case…
T(limit)=0 would indicate SW slowdown to kick in, 2° hotter, HW slowdown will kick in, and 5° hotter HW shutdown will happen to protect the GPU…
Generally, any current/average temp readings up to high 80s are still fully within certification limits! = no reason to worry.
If you have a simple way to keep the GPU cooler, fine, any slowdown will kick in only later, and overall, lifetime of cooler electronic components increases over hotter components.
That said, Nvidia and their OEM partners guarantee full functionality of the GPU in a certified chassis and a specified environmental temperature for the full warrantied time of the product…
I have requested input from product management to consider making the nvidia-smi readings easier to understand for endusers….