Hello,
I faced an issue after recent driver update.
I’d like to share some details about it.
Any help would be appreciated.
Thank you.
Issue description
Recent driver update (535.54.03-5) caused one of my systems to freeze when idle. When the screen turns off, it won’t go back. The GPU fans start to get loud and the workstation is not responsible at all. Even the caps-lock diode won’t light up after pressing the caps key. Switching to other TTY (Ctrl + Alt + FX) also stopped working. The input and output devices get completely frozen.
Seems like the freeze issue affects only the X11 server.
The OS still works. I am able to SSH-login to it.
I can stay logged in via SSH, but I cannot restart display manager (SDDM in my case).
Once the issue happens the only way to rollback from it, is to restart the whole machine.
The restart process also takes a long time (guess systemd is waiting for frozen processes).
Technical details below:
- OS and kernel version:
$ uname -a
Linux msi-kd 5.15.120-1-MANJARO #1 SMP PREEMPT Wed Jul 5 21:45:37 UTC 2023 x86_64 GNU/Linux
(I tried many different kernel versions, the results are always the same)
- The computer is a notebook with dual GPU (Intel UHD Graphics 630 + NVIDIA GeForce GTX 1660 Ti). It runs in Nvidia-only mode (configuration is managed by
optimus-manager
script).
$ sudo lshw -C display
*-display
description: VGA compatible controller
product: TU116M [GeForce GTX 1660 Ti Mobile]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:155 memory:a4000000-a4ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:4000(size=128) memory:a5000000-a507ffff
*-display
description: VGA compatible controller
product: CoffeeLake-H GT2 [UHD Graphics 630]
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:149 memory:a3000000-a3ffffff memory:80000000-8fffffff ioport:5000(size=64) memory:c0000-dffff
- When the issue happens I can see following error logged into the OS journals
kernel: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c57d:0 2:0:3144:3136
Then, I can see following entries if I try to troubleshot the issue (I think I tried to restart SDDM via SSH).
sddm[1834]: Failed to read display number from pipe
kernel: [drm:drm_new_set_master [drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
- Some extra info for diagnostics:
$ nvidia-smi
Wed Jul 19 01:20:07 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 On | N/A |
| N/A 48C P3 23W / 80W | 1197MiB / 6144MiB | 10% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 4770 G /usr/lib/Xorg 513MiB |
| 0 N/A N/A 4895 G /usr/bin/ksmserver 2MiB |
| 0 N/A N/A 4897 G /usr/bin/kded5 2MiB |
| 0 N/A N/A 4898 G /usr/bin/kwin_x11 138MiB |
| 0 N/A N/A 4924 G /usr/bin/plasmashell 40MiB |
| 0 N/A N/A 4945 G ...b/polkit-kde-authentication-agent-1 2MiB |
| 0 N/A N/A 5039 G /usr/lib/kdeconnectd 2MiB |
| 0 N/A N/A 5045 G /usr/bin/kaccess 24MiB |
| 0 N/A N/A 5099 G /usr/lib/xdg-desktop-portal-kde 2MiB |
| 0 N/A N/A 5653 G /usr/lib/firefox/firefox 335MiB |
| 0 N/A N/A 6841 G /usr/bin/keepassxc 2MiB |
| 0 N/A N/A 7825 G /usr/bin/konsole 2MiB |
| 0 N/A N/A 8663 G /usr/bin/dolphin 2MiB |
| 0 N/A N/A 8804 G /usr/bin/kwrite 2MiB |
| 0 N/A N/A 8825 G /usr/lib/thunderbird/thunderbird 108MiB |
| 0 N/A N/A 14538 G /usr/bin/konsole 2MiB |
| 0 N/A N/A 14851 G /usr/bin/dolphin 2MiB |
| 0 N/A N/A 15319 G /usr/bin/konsole 2MiB |
+---------------------------------------------------------------------------------------+
$ xrandr
Screen 0: minimum 8 x 8, current 5360 x 2520, maximum 32767 x 32767
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-0 connected primary 3440x1440+1920+0 (normal left inverted right x axis y axis) 1mm x 1mm
3440x1440 99.98 + 165.00* 144.00 120.00 59.97
2580x1080 164.69 59.94
2560x1440 59.95
1920x1080 119.88 60.00 59.94 50.00
1680x1050 59.95
1280x1024 75.02 60.02
1280x960 60.00
1280x720 60.00 59.94 50.00
1152x864 75.00
1024x768 75.03 70.07 60.00
800x600 75.00 72.19 60.32 56.25
720x576 50.00
720x480 59.94
640x480 75.00 72.81 59.94 59.93
DP-1 disconnected (normal left inverted right x axis y axis)
eDP-1-1 connected 1920x1080+0+1440 (normal left inverted right x axis y axis) 382mm x 215mm
1920x1080 60.00*+ 59.97 59.96 59.93
1680x1050 59.95 59.88
1400x1050 59.98
1600x900 59.99 59.94 59.95 59.82
1280x1024 60.02
1400x900 59.96 59.88
1280x960 60.00
1440x810 60.00 59.97
1368x768 59.88 59.85
1280x800 59.99 59.97 59.81 59.91
1280x720 60.00 59.99 59.86 59.74
1024x768 60.04 60.00
960x720 60.00
928x696 60.05
896x672 60.01
1024x576 59.95 59.96 59.90 59.82
960x600 59.93 60.00
960x540 59.96 59.99 59.63 59.82
800x600 60.00 60.32 56.25
840x525 60.01 59.88
864x486 59.92 59.57
700x525 59.98
800x450 59.95 59.82
640x512 60.02
700x450 59.96 59.88
640x480 60.00 59.94
720x405 59.51 58.99
684x384 59.88 59.85
640x400 59.88 59.98
640x360 59.86 59.83 59.84 59.32
512x384 60.00
512x288 60.00 59.92
480x270 59.63 59.82
400x300 60.32 56.34
432x243 59.92 59.57
320x240 60.05
360x202 59.51 59.13
320x180 59.84 59.32
1680x1050 (0x1ca) 146.250MHz -HSync +VSync
h: width 1680 start 1784 end 1960 total 2240 skew 0 clock 65.29KHz
v: height 1050 start 1053 end 1059 total 1089 clock 59.95Hz
1280x1024 (0x1cc) 108.000MHz +HSync +VSync
h: width 1280 start 1328 end 1440 total 1688 skew 0 clock 63.98KHz
v: height 1024 start 1025 end 1028 total 1066 clock 60.02Hz
1280x960 (0x1cd) 108.000MHz +HSync +VSync
h: width 1280 start 1376 end 1488 total 1800 skew 0 clock 60.00KHz
v: height 960 start 961 end 964 total 1000 clock 60.00Hz
1024x768 (0x1d4) 65.000MHz -HSync -VSync
h: width 1024 start 1048 end 1184 total 1344 skew 0 clock 48.36KHz
v: height 768 start 771 end 777 total 806 clock 60.00Hz
800x600 (0x1d7) 40.000MHz +HSync +VSync
h: width 800 start 840 end 968 total 1056 skew 0 clock 37.88KHz
v: height 600 start 601 end 605 total 628 clock 60.32Hz
800x600 (0x1d8) 36.000MHz +HSync +VSync
h: width 800 start 824 end 896 total 1024 skew 0 clock 35.16KHz
v: height 600 start 601 end 603 total 625 clock 56.25Hz
640x480 (0x1dd) 25.175MHz -HSync -VSync
h: width 640 start 656 end 752 total 800 skew 0 clock 31.47KHz
v: height 480 start 490 end 492 total 525 clock 59.94Hz
- Also I’d like to share the output from
nvidia-bug-report.sh
command. But the website won’t let me upload it (there was an error uploading that file
). I will try to do it in a separate message.
Update: I can’t get file-upload function to work on this forum, sorry. Hope above info will be sufficient.
Best regards,
Kamil.