Subject: Fan Control Issue on Open NVIDIA Driver 570.86.15 – Fan Stuck at 30% Under Load (nvidia-ml-py Attempted)
Hello,
I’m reporting a fan control issue on my headless Ubuntu 24.04.1 LTS system running the NVIDIA open driver (version 570.86.15). Under heavy load, the GPU fan remains fixed at 30% even as the GPU temperature rises significantly.
System Configuration
- OS: Ubuntu 24.04.2 LTS (Noble Numbat) (headless, accessed via SSH)
- Kernel Version: 6.8.0-53-generic
- GPU: NVIDIA RTX A2000
- Driver: NVIDIA open driver, version 570.86.15
- CUDA Version: 12.8
Issue Description
When I run a stress test using ./gpu_burn 300
, the GPU temperatures climb up to 95°C under load, yet the fan speed remains at 30% throughout the test. For example, the output from gpu_burn
shows:
...
42.3% proc'd: 544 (3989 Gflop/s) errors: 0 temps: 86 C
Summary at: Sun Feb 16 18:43:29 CET 2025
...
100.0% proc'd: 864 (923 Gflop/s) errors: 0 temps: 95 C
Killing processes with SIGTERM (soft kill)
Freed memory for dev 0
Uninitted cublas
done
Tested 1 GPU:
GPU 0: OK
The current output from nvidia-smi
confirms the fan is at 30% despite the high temperature:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.15 Driver Version: 570.86.15 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A2000 Off | 00000000:07:00.0 On | 0 |
| 30% 96C P2 43W / 70W | 4734MiB / 5754MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 28139 C ./gpu_burn 4716MiB |
+-----------------------------------------------------------------------------------------+
I also attempted to influence the fan speed by setting a target temperature using:
sudo nvidia-smi --gpu-target-temp=65 -i 0
Yet, the fan speed remains unchanged.
Attempted Workaround Using nvidia-ml-py
In an effort to gain manual control over the fan speed, I explored using the Python NVML wrapper, nvidia-ml-py, and even tested a script from the repository fan_control_nvidia-ml-py. Unfortunately, these approaches did not yield any change—the fan continues to run at 30% regardless of the commands issued.
Request for Guidance
I would appreciate any insights on the following:
- Is manual (or dynamic) fan control expected to work with the current open driver (570.86.15) on headless systems?
- Are there any known workarounds or settings that can enable higher fan speeds under load using the open driver?
- Is this behavior a known limitation in the open driver stack, with future releases likely to include more complete fan control support?
Any feedback or suggestions would be greatly appreciated.
Thank you for your efforts in advancing the open driver stack!
Best regards,
Simone Flavio
nvidia-bug-report.log.gz (328.7 KB)
EDIT: I had the same problem with 565 version. I hoped to fix it with an upgrade after I read 570 Changelog:
“Updated the nvidia-settings control panel to use NVML rather than NV-CONTROL to control GPU clocks and fan speed.”