Driver crashes every couple of days - using purely for desktop applications

I’m using the nvidia driver only to speed up my KDE plasma desktop environment, not for gaming or deep learning. Every couple of days I’m running into severe issues. Although they are slightly different, I post them in the same thread because I assume (and hope) they share a common reason.

  1. Extremely loud fan noise of the graphics card
  2. KDE Plasma suddenly becomes slow, but does not become unusable.
  3. KDE Plasma gradually becomes extremely slow, reacts only every couple of minutes (!) to a mouse click. I invested 2 hours to create the attached log file.

In all three cases nvidia-smi confirms there is a problem. Example:

| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|ERR!   69C    P0   ERR! / 130W |   2379MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |

I attached log files for all the three cases I described (in the single zip file). In addition to 520.56.06 I also tried 515.76 and 470.141.03. With all three versions, above problems occur, but with 470.141.03 they happen more frequently (several times a day).

Directly after booting, nvidia-smi confirms that everything is okay, and I can run Unigine_Valley-1.0 with excellent frame rate (with 470, it is better than with no driver, but far worse than with 515 or 520). A summary of my system:

Computer model:       Micro-Star International Co., Ltd. MS-7D42/MAG Z690M MORTAR WIFI (MS-7D42), BIOS B.91 10/17/2022
CPU:                  12th Gen Intel(R) Core(TM) i9-12900T
Physical cores:       16
Graphics card:        NVIDIA GeForce GTX 1660 SUPER/PCIe/SSE2
Total RAM:            125 GB
Linux distribution:   Feren OS
Ubuntu codename:      focal
Debian version:       bullseye/sid
Linux kernel version: 5.15.0-56-generic

One last information: nvidia-smi seems to cause above issues. I am not sure about this, but when calling it directly after booting (when everything is still okay), I have the feeling that the issues occur more frequent than usual. Of course, for the cases in the log files, I only executed nvidia-smi after the issue arose.

At some time, the nvidia gpu goes into error state, seen in nvidia-smi and sometimes completely shutting down.
Please monitor temperatures, check airflow, maybe it’s just overheating due to a blocked fan. Another reason might be the gpu beginning to break.

Thanks for your reply. I can rule out overheating as the cause, because

  1. I have already checked the temperatures regularly with nvidia-smi,
  2. the problems occur spontaneously and without prior audible fan noise when using office applications,
  3. the system is professionally built with a focus on low temperatures (large and heavy heat sinks).
  4. the problem was investigated by the manufacturer itself, who replied that these problems do not occur under Windows, even with high continuous load on the graphics card.

There also seems to be some issue with your integrated wifi after some time, maybe some linux mainboard incompatibility. Please try upgrading your kernel using the liquorix ppa.

Thank you. I am now using the 6.0.0-13.3-liquorix-amd64 kernel and Nvidia works. For some reason, KDE Plasma was not available after updating to this kernel, but after reinstalling KDE everything is fine, including my old configuration.

I cannot provoke the issues, so I will report back in a week to see if this has solved my issues (or sooner, if not).

The issues did not occur anymore. Many thanks for your help!

