Nvidia A30 GPU Overheating to 89°C without any process

ungtt · January 27, 2025, 5:33pm

Hi everyone,

My server currently has two Nvidia A30 GPUs. I’ve updated to the latest driver and CUDA version, but when I start the machine, both GPUs initially show a temperature of around 50°C. However, within a few minutes, the temperature of one GPU gradually increases to 89°C (~100W).

I’ve tried swapping slots, changing PCIe cables, updating the driver, and more, but the issue remains unresolved. It seems like a hardware-related problem.

Has anyone encountered a similar issue before?

rs277 · January 27, 2025, 6:32pm

Has this situation only occured due to a change in driver or has the hardware changed?

If the latter, be aware that these cards have no fans and are designed to be installed in enclosures that provide adequate airflow from external fans. That said, it does seem strange for card 0 to be using 102W, with 0% load, unless damaged.

ungtt · January 27, 2025, 6:45pm

Thank you for your replied.
We recently started using both GPUs; previously, we were only using one GPU. When I noticed abnormal temperatures on GPU0, I tried the above methods to fix it.
The A30 does not have a cooling fan, meaning that if the temperature rises, it is due to increased power consumption, even though no processes are being executed. I also believe this is a hardware issue.

mercier.nicolas · January 27, 2025, 7:57pm

You are using driver 570. I reported a small power consumption increase on unused GPU here: Increased idle consumption with driver 570 not sure it is related but it’s interesting.

Have you tried downgrading to driver 565 to see if the problem still happens?

MichelN86 · January 27, 2025, 10:01pm

Here is a topic about it

Topic		Replies	Views
GPU reports wrong power usage values Drivers - Linux, Windows, MacOS	7	198	May 27, 2025
Tesla V100 GPU thermal causing shutdown even it's doing nothing Linux boot , kernel , ubuntu	10	1571	December 17, 2020
GPU 0 Overheating if >1 Tesla K80 Installed Tesla Boards	2	1929	May 27, 2021
Overheating of 3080 Ti despite fan control settings Linux	0	581	April 3, 2023
Second RTX 3090 remains in high idle power Linux	7	87	May 31, 2025
[Nvidia 331.x - 334.21] GT 635M - straight overheating Linux	5	3842	April 29, 2014
GPU temperature keeps increasing just with a single memory allocation. CUDA 4.0 + CUDA Programming and Performance	7	15479	April 6, 2011
Nvidia Linux Driver Request Linux	0	321	September 25, 2023
A100 Performance Deep Learning (Training & Inference) hw , cuda , ubuntu , power , gpu , gpu-computing	0	616	March 21, 2023
A100 crashes within 10 minutes due to over-heating on Ubuntu 18.04 (without any workload) Linux ubuntu , driver	7	3243	December 3, 2021

Nvidia A30 GPU Overheating to 89°C without any process

Related topics