Dear nVidia-Community,
since I installed Linux Mint last summer next to my Windows installation on my Desktop PC, I’m using it mostly as my daily driver and I would be pretty happy with it - if there were not one thing: unfortunately from the beginning I experience freezes. They only occur on Mint, not on the alongside installed Windows 11. Sometimes I have a complete day without freeze, sometimes it occurs 2-3 times a day.
When the freeze occurs the system doesn’t react to anything: no switching with CTRL+ALT+Fx, no CRTL+ALT+DEL… I need to switch the PC completely off via the power button. If I have playing music in the background while the freeze occurs (sometimes I do, sometimes I don’t) the music keeps playing in an endless loop of a ~1 second portion of the song.
I connected from another Machine to my Desktop PC via SSH and let “dmesg --follow” running. The moment that the Desktop dies again, I see the following output from dmesg:
[16257.752527] NVRM: GPU at PCI:0000:02:00: GPU-a47ca1a7-7995-cc50-707f-d504155773f0
[16257.752540] NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[16257.752544] NVRM: GPU 0000:02:00.0: GPU has fallen off the bus.
[16257.802517] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[16299.025739] userif-3: sent link down event.
[16299.025751] userif-3: sent link up event.
Running nvidia-bug-report.sh before switching the system off and on is not possible because also the SSH session dies a few seconds later.
There is not an obvious activity how to trigger the freeze, it occurs on several situations including:
- switching from one Window to the other, e.g. from Thunderbird to Chromium
- scrolling through a website, e.g. through Amazon on Firefox
- working in an RDP-session
- …
So nothing where the system would be under load. In the contrary: sometimes I can play a Steam game for six hours or render a movie in handbrake without problems with the fans working pretty hard and everything seems fine.
Freezes seem to happen mostly in “banal” situations.
What I already tried during that past months:
- Updating UEFI
- uninstalling the NVidia-Driver and using the nouveau driver (freezes also occured, so I reinstalled it)
- using NVidia 525 and 470 instead of 535 (freezes occurs with all 3, so I’m using 535 again now)
- switching to 6.5 Kernel (makes no difference, but I’m currently still using it)
- Checking GPU Temperature via NVIDIA Settings Application and other Temperatures with “sensors” in Terminal (everything fine)
I found some threads in this forum regarding the same issue where replies go in the direction of “Hardware issue” or “power supply problem”.
But I wouldn’t see how it would be a hardware problem because on Windows the issue does not occur at all, only on Linux.
Any help how to debug and solve this issue would be very much appreciated!
Thanks in advance for your assistance and best regards
Ben
Current config (if you need more information please let me know):
inxi -Fxzd
System:
Kernel: 6.5.0-14-generic x86_64 bits: 64 compiler: N/A
Desktop: Cinnamon 6.0.4 Distro: Linux Mint 21.3 Virginia
base: Ubuntu 22.04 jammy
Machine:
Type: Desktop System: Alienware product: Alienware Aurora R12 v: 1.1.23
serial: <superuser required>
Mobo: Alienware model: 0P0JWX v: A00 serial: <superuser required>
UEFI: Alienware v: 1.1.23 date: 11/08/2023
CPU:
Info: 8-core model: 11th Gen Intel Core i9-11900KF bits: 64 type: MT MCP
arch: Rocket Lake rev: 1 cache: L1: 640 KiB L2: 4 MiB L3: 16 MiB
Speed (MHz): avg: 827 high: 933 min/max: 800/5100:5300 cores: 1: 800
2: 800 3: 800 4: 925 5: 800 6: 800 7: 800 8: 800 9: 933 10: 923 11: 800
12: 800 13: 862 14: 800 15: 800 16: 800 bogomips: 112128
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: NVIDIA GA102 [GeForce RTX 3090] vendor: Dell driver: nvidia
v: 535.146.02 bus-ID: 02:00.0
Device-2: Microsoft LifeCam HD-5000 type: USB
driver: snd-usb-audio,uvcvideo bus-ID: 1-4.1:4
Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: nvidia
unloaded: fbdev,modesetting,nouveau,vesa gpu: nvidia
resolution: 3840x1600~60Hz
OpenGL: renderer: NVIDIA GeForce RTX 3090/PCIe/SSE2
v: 4.6.0 NVIDIA 535.146.02 direct render: Yes
Audio:
Device-1: Intel vendor: Dell driver: snd_hda_intel v: kernel
bus-ID: 00:1f.3
Device-2: NVIDIA GA102 High Definition Audio vendor: Dell
driver: snd_hda_intel v: kernel bus-ID: 02:00.1
Device-3: Microsoft LifeCam HD-5000 type: USB
driver: snd-usb-audio,uvcvideo bus-ID: 1-4.1:4
Device-4: GN Netcom Jabra Link 380 type: USB
driver: jabra,snd-usb-audio,usbhid bus-ID: 1-4.4.1:12
Sound Server-1: ALSA v: k6.5.0-14-generic running: yes
Sound Server-2: PulseAudio v: 15.99.1 running: yes
Sound Server-3: PipeWire v: 0.3.48 running: yes
Network:
Device-1: Intel Comet Lake PCH CNVi WiFi vendor: Rivet Networks
driver: iwlwifi v: kernel bus-ID: 00:14.3
IF: wlo1 state: down mac: <filter>
Device-2: Realtek Killer E3000 2.5GbE vendor: Dell driver: r8169
v: kernel port: 3000 bus-ID: 04:00.0
IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
IF-ID-1: vmnet1 state: unknown speed: N/A duplex: N/A mac: <filter>
IF-ID-2: vmnet8 state: unknown speed: N/A duplex: N/A mac: <filter>
Bluetooth:
Device-1: Intel AX201 Bluetooth type: USB driver: btusb v: 0.8
bus-ID: 1-14:9
Report: hciconfig ID: hci0 rfk-id: 0 state: up address: <filter>
bt-v: 3.0 lmp-v: 5.2
Drives:
Local Storage: total: 14.6 TiB used: 7.11 TiB (48.7%)
ID-1: /dev/nvme0n1 vendor: Samsung model: PM981a NVMe 2048GB
size: 1.86 TiB temp: 37.9 C
ID-2: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
ID-3: /dev/sdb vendor: Samsung model: SSD 870 QVO 8TB size: 7.28 TiB
ID-4: /dev/sdc vendor: Samsung model: SSD 870 QVO 4TB size: 3.64 TiB
Message: No optical or floppy data found.
Partition:
ID-1: / size: 3.58 TiB used: 441.55 GiB (12.0%) fs: ext4 dev: /dev/sdc3
ID-2: /boot/efi size: 146 MiB used: 92 MiB (63.0%) fs: vfat
dev: /dev/nvme0n1p1
Swap:
ID-1: swap-1 type: file size: 2 GiB used: 0 KiB (0.0%) file: /swapfile
Sensors:
System Temperatures: cpu: 48.0 C pch: 49.0 C mobo: N/A gpu: nvidia
temp: 62 C
Fan Speeds (RPM): N/A gpu: nvidia fan: 38%
Info:
Processes: 387 Uptime: 36m Memory: 125.47 GiB used: 3.78 GiB (3.0%)
Init: systemd runlevel: 5 Compilers: gcc: 11.4.0 Packages: 2934 Shell: Bash
v: 5.1.16 inxi: 3.3.13