Please stop. The kernel crash on dereferencing a NULL pointer in a driver’s function is probably the most conclusive and unambiguous indication that a bug is in the driver.
Most likely because it was introduced in that version.
Because “not using hardware acceleration” option in one userspace program does not reliably prevent any particular piece of driver’s functionality from being used, especially in a modern desktop that uses compositing for everything. Also the problem is probably in the implementation of some basic functionality, most likely a race condition in something very common. The number of calls may affect the likeliness of a crash, however it can’t be eliminated entirely.
If a guess made by @generix is correct, and preemption is either the necessary condition or it greatly increases the probability of a crash being triggered, it would strongly indicate a race condition.
Following recent posts here I have been testing today running kernel 5.10.18 compiled with CONFIG_PREEMPT_NONE=y set, otherwise default config, and the latest 460.39 driver.
So far I have been unable to reproduce the crash whilst watching video in Kodi (For me always the trigger of crash) in around 6 hours.
But of course it is not absolutely reliable to reproduce in such a time frame, having said that previously I could not get past 3/4 days uptime whilst using kodi each day before getting the crash. So I will see how it goes and report back if I run into it in the coming days.
We have fix available for similar kind of issue and its fix in our latest release which is available to download on below link.
Please test with the above driver and share the feedback.
We have fix available for similar kind of issue
What issue do you refer to as “similar”?
What in particular was done to fix it?
Have installed driver 460.56 on Manjaro. Will give it a few days to see if it fixes the issue and report back.
I installed 450.56 on manjaro almost immediately after notification about the post here and still running it. no freezings yet. will report here in a few days.
I have reproduced this issue by playing a full-screen video in Kodi on HDMI-0 output.
This time I couldn’t collect logs (ssh didn’t work).
xrandr output:
HDMI-0 connected 2560x1440+5120+0 (normal left inverted right x axis y axis) 608mm x 345mm
3840x2160 30.00 + 29.97 25.00 23.98
2560x1440 59.95*
1920x1080 60.00 59.94 50.00 29.97 23.98
1680x1050 59.95
1600x900 60.00
1440x900 59.89
1280x1024 75.02 60.02
1280x800 59.81
1280x720 60.00 59.94 50.00
1152x864 75.00
1024x768 75.03 70.07 60.00
800x600 75.00 72.19 60.32 56.25
720x576 50.00
720x480 59.94
640x480 75.00 72.81 59.94
DP-0 connected primary 5120x1440+0+0 (normal left inverted right x axis y axis) 1mm x 1mm
3840x1080 99.96 + 59.97
5120x1440 100.00* 59.98
2560x1440 59.95
2560x1080 100.00 60.00 59.94
1920x1080 100.00 60.00 59.94
1680x1050 59.95
1600x900 60.00
1440x900 59.89
1280x1024 75.02 60.02
1280x800 59.81
1280x720 60.00
1152x864 75.00
1024x768 75.03 70.07 60.00
800x600 75.00 72.19 60.32 56.25
640x480 75.00 72.81 59.94
OS: ArchLinux
Nvidia drivers: 460.56
Kernel: 5.11.1-zen1-1-zen kernel (Arch Linux).
No freezings yet.
Using kernel 5.11.1-arch1-1
(Obviously using 460.56)
For the sake of everything that is, was, will be or might be sacred, instead of this wall of text, post:
- Nvidia driver version number.
- Kernel version number (and better the output of uname -a).
- Types of failures (graphics distortion, uneven video playback speed, slowdown, high CPU load, graphics or full computer lockup, kernel panic if kernel or logs are collected).
- Software used.
Right, Nvidia recommendations are not very useful and their collections scripts are often not accessible at the time of failure. Nevertheless, please post something that qualifies as a bug report.
“Linux HNT-Quad-ROS 5.10.15-120-tkg-bmq #1 TKG SMP PREEMPT Mon, 15 Feb 2021 02:15:43 +0000 x86_64 GNU/Linux” which are the ivybridge version of tkg-bmq.
Drivers are chaotic-nvidia-dkms-tkg-460.39.6 (The time of posting the update match with when the issue started).
The issue was video acceleration glitching, low frame rate on video, high cpu load with memory leaking on chrome when video acceleration was on, had to disable it to be fine.
Software, everything using video acceleration : Chrome, Steam Store Video, Discord, VLC and other Media Player.
Downgrading many package related to video drivers seem to fix it.
Looks like a problem with 460.39 support of older GPUs. It should be reported separately with this information and last working driver version.
This thread is about a different problem – one that seems to affect all GPUs and causes a kernel panic, is present in 460.39, and might be fixed in 460.56.
Two days in using driver 460.56 and so far no crashes. Won’t count my chickens just yet but it’s looking promising.
fifth day of using new fixed driver. still no crashes while pc enabled almost all day. The bug is fixed I suppose.
Manjaro, 5.10.18, Nvidia 460.56
I’m jealous. I’ve been reporting nvkms crashdumps in the ‘stable’ driver for over a -year- and you guys got NVidia to fix the issue in less than 5 months!
I had the second crash since I have started using the newest drivers (460.56/5.11.1-arch1-1).
The crash has happened when I was away from the keyboard for around 12 minutes.
This time I was able to connect through ssh and collect logs using nvidia-bug-report.sh --safe-mode --extra-system-data
nvidia-bug-report.sh --safe-mode --extra-system-data
nvidia-bug-report.log.gz (91.9 KB)
Just to follow up my previous post I have been running the older problematic driver 460.39 with 5.10 kernel compiled with preemption disabled and not had the crash once with almost 7 days uptime, doing the same activity as was causing crash every day.
So I would say from my albeit limited testing you guys were quite probably correct here.
Going to try the latest driver now with my normal kernel config with preemption, looks good so far based on lack of reports here so far, so hopefully they fixed it this time.
I took peak at your bug report and I don’t think that is the same problem, at least the log looks different than all the others from this thread.
I also guess that kamiox’s new crashes is a different bug, introduced in 460.56:
https://forums.developer.nvidia.com/t/display-detection-always-crashes-hard-locks-arch-linux/169653
The second crash might be a different bug, but my first crash on recent drivers was very similar to those previously reported. It has happened when I was running VLC in a full-screen mode, unfortunately, the system crashed completely so I was unable to get any logs from the machine.
I filed internal bug 3268472 for this new crash.