Recently bought P104-100 card to run headless for learning Cuda, fan spins up to 100% as soon as driver loads.

Hello, I am trying to use a Gigabyte P104-100 card I just bought to learn Cuda and GPU computing and perhaps also as an additional openCL compute card as well. I’m on a very tight budget so this was basically my Christmas gift to myself.

The card has three fans normally but one is missing.

The problem is- after successfully installing the cuda toolkit and the nvidia driver including the modprobe tool, as soon as I reboot the machine and X starts to load (even when the xorg.conf says the AMD card in the first slot is the only card configured to display anything and either says nothing about the Nvidia card, or explicitly says its not set to display) the remaining two fans on the P104-100 spin up to 100% as soon as the nvidia modules load.Even when there is absolutely no load on the card at all.

I realize this may be a safety feature, is it possible to disable it while I order a replacement? Also, will having all three fans in place in the card fix the speeding up to 100% behavior?

My case has plenty of ventilation, it has two large 120mm fans in its front and one in the back, two stacked CPU fans by Noctua and temperatures are very cool and apart from the Gigabyte card its almost silent except when running at full load,

Here is my kernel version (uname -a) and card info

4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u1 (2019-09-20) x86_64 GNU/Linux

nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version --format=csv
name, pci.bus_id, vbios_version
P104-100, 00000000:03:00.0, 86.04.7A.00.30

I’ve tried (Debian packaged) driver versions

nvidia-driver 430-64-4 with nvidia-cuda-toolkit 10.1.168-3 (testing)
(has fan problem)

nvidia-driver 418.74-1 with nvidia-cuda-toolkit 9.2.148-7 (stable)
(has fan problem) and

nvidia-driver 390.116-1 with nvidia-cuda-toolkit 8.0.44-4 (oldstable)
no fan problem but had other problems with compiler mismatch
also Cuda version was only at 8. So I never got it to “work”.

Some errata: ACPI is NOT installed,

I let the Motherboard’s BIOS handle the MB case fan speeds, and it does that really well…

Thank you!

Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Thank you for your help!

Reading somebody’s suggestion elsewhere to try purge and reinstall

I purged everything from the system and reinstalled, all with the newest testing versions, and so far so good. Its quiet and nvidia-smi reports it as working so far!

~$ nvidia-smi
Wed Dec 25 14:16:27 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 P104-100 Off | 00000000:03:00.0 Off | N/A |
| 0% 39C P0 38W / 180W | 0MiB / 4042MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+