[GTX760] Problems with Fancontroller

Hey,

I know that this I a well know issue and that there are a lot of topics about this problem already.

My System:
GTX 760
Ubuntu 18.04
Current Driver: 440.5
Tested Drivers: 390, 415, 430, 440

My problem is that since I installed the Nvidia driver on my system the GPU continuously ramps up the fans (0rpm-~2500rpm).
Before that, there were no problems with the fans and there are no problems in bios, so I think it’s not a hardware issue.
Furthermore, when I run some benchmarks to push the GPU temperature to 70+C and then let it cool down, it cools down normally and stays quiet (~1400rpm) for about 20 mins.


What I have tried so far
:

  • changing the drivers (390, 415, 430, 440)
    => no changes

  • set the Coolbit to 4, 12 or 28
    => enables the manual fan control option, but the GPU ignores any fixed value.
    => could not figure out what 12 or 28 are doing but found it in some solutions

  • nvidia-settings -a ‘[gpu:0]/GPUFanControlState=1’ -a ‘[fan:0]/GPUTargetFanSpeed=99’
    => no changes

  • install nvclock
    => cant locate the NVClock package, I guess it’s deprecated

  • install lm-sensors (fan-control)
    => sensors: shows the temperatures but no pwm/fanspeed
    => fan control: can’t change the fan speed (no pwm sensor)

  • reboot the system

Screenshots / Logs

:~$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +35.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +29.0°C  (high = +80.0°C, crit = +100.0°C)
~$ nvidia-smi
Wed Oct 30 13:23:02 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.26       Driver Version: 440.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 760     Off  | 00000000:01:00.0 N/A |                  N/A |
| 48%   40C    P0    N/A /  N/A |    223MiB /  4034MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+
$ sudo nvidia-xconfig -t

Using X configuration file: "/etc/X11/xorg.conf".

    ServerLayout "Layout0"
        |
        |--> Screen "Screen0"
        |       |
        |       |--> Monitor "Monitor0"
        |       |       |
        |       |       |--> VendorName "Unknown"
        |       |       |--> ModelName "Unknown"
        |       |       |--> HorizSync  
        |       |       |--> VertRefresh  
        |       |       |--> Option "DPMS"
        |       |
        |       |--> Device "Device0"
        |       |       |--> Driver "nvidia"
        |       |       |--> VendorName "NVIDIA Corporation"
        |       |
        |       |--> Option "Coolbits" "4"
        |       |--> DefaultColorDepth 24

Driver GUI
https://imgur.com/a/qzzySdv

I try now since 3 days to solve this problem and i hope someone has new idea how i could fix it.

The fan curve is hardcoded by the manufacturer in the vbios. On some gpus, manual fan control is blocked by the manufacturer.
Since you have a Kepler device which is reflashable, you might look into manipulating the vbios.

Ok, changing the vbios would be the really last option for me. The manufacturer is Gainward and I haven’t found any hints that they block the fan control.
I have now removed the nvidia driver again and the fans work like they were supposed. So I still hope that there is some driver/software solution for my problem.

When the nvidia driver is removed, nouveau takes over and the gpu only runs at minimum clocks so the fans don’t need to spin up.