Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

I installed 450.56 on manjaro almost immediately after notification about the post here and still running it. no freezings yet. will report here in a few days.

I have reproduced this issue by playing a full-screen video in Kodi on HDMI-0 output.
This time I couldn’t collect logs (ssh didn’t work).

xrandr output:

HDMI-0 connected 2560x1440+5120+0 (normal left inverted right x axis y axis) 608mm x 345mm
   3840x2160     30.00 +  29.97    25.00    23.98  
   2560x1440     59.95* 
   1920x1080     60.00    59.94    50.00    29.97    23.98  
   1680x1050     59.95  
   1600x900      60.00  
   1440x900      59.89  
   1280x1024     75.02    60.02  
   1280x800      59.81  
   1280x720      60.00    59.94    50.00  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32    56.25  
   720x576       50.00  
   720x480       59.94  
   640x480       75.00    72.81    59.94  
DP-0 connected primary 5120x1440+0+0 (normal left inverted right x axis y axis) 1mm x 1mm
   3840x1080     99.96 +  59.97  
   5120x1440    100.00*   59.98  
   2560x1440     59.95  
   2560x1080    100.00    60.00    59.94  
   1920x1080    100.00    60.00    59.94  
   1680x1050     59.95  
   1600x900      60.00  
   1440x900      59.89  
   1280x1024     75.02    60.02  
   1280x800      59.81  
   1280x720      60.00  
   1152x864      75.00  
   1024x768      75.03    70.07    60.00  
   800x600       75.00    72.19    60.32    56.25  
   640x480       75.00    72.81    59.94 

OS: ArchLinux
Nvidia drivers: 460.56
Kernel: 5.11.1-zen1-1-zen kernel (Arch Linux).

No freezings yet.
Using kernel 5.11.1-arch1-1
(Obviously using 460.56)

Hi, i use Arch Linux with a Intel i5-3450 CPU and a GTX 650 Ti. I know i am almost to the fucking cutting line of the support for the driver… But i reinstalled my Arch Linux last week after a fuck up with BTRFS. My PC was running fine, was updating everyday while also hibernating for the night until i got an issue with chrome freezing (it always happen with so much tabs i got). So i applied more update and rebooted… Now, with the latest update, i opened chrome and noticed everything started to fuck up… Youtube was using 80% of the GPU with memory leaking. I soon realise Discord, Steam and all video player was also suffering intensely. It seem video acceleration issue started up on the update… In the last 10 days or so… So off course it was driving me crazy. I downgraded a shit load of video package and everything’s fine now. All i could find back on google was this thread… I use the chaotic-aur’s TKG suit of Kernel/Nvidia Driver/Mesa and more… So off course when i was reading this thread i realise YOU GUYS FUCK UP THE PATCH FOR THAT ISSUE… Making people who had no issue with the video driver now being throw with everyone else having issue… Off course it’s upstream issue. I am fairly sure it’s not TKG since all they do’s reapply the patch that was working fine for month from your source… Off course i am in dire need of upgrading my GPU, i got a lot of vintage keyboard i plan to trade for PC part including GPU… But idk, i might just go AMD if that was not them also having issue once in a while. I kinda just wanna game too… One of my friend offered me more recent nvidia GPU but i am very unsure if i wanna bite the dust now with that upgrade. Since i had downgraded, i cannot remake the issue without upgrading back, but since i am moving on monday, i need to have my PC working, so i will abstain from doing it.

TL;DR : You guys introduced Video Acceleration Issue on my GTX 650 Ti running Arch Linux.

For the sake of everything that is, was, will be or might be sacred, instead of this wall of text, post:

  1. Nvidia driver version number.
  2. Kernel version number (and better the output of uname -a).
  3. Types of failures (graphics distortion, uneven video playback speed, slowdown, high CPU load, graphics or full computer lockup, kernel panic if kernel or logs are collected).
  4. Software used.

Right, Nvidia recommendations are not very useful and their collections scripts are often not accessible at the time of failure. Nevertheless, please post something that qualifies as a bug report.

“Linux HNT-Quad-ROS 5.10.15-120-tkg-bmq #1 TKG SMP PREEMPT Mon, 15 Feb 2021 02:15:43 +0000 x86_64 GNU/Linux” which are the ivybridge version of tkg-bmq.

Drivers are chaotic-nvidia-dkms-tkg-460.39.6 (The time of posting the update match with when the issue started).

The issue was video acceleration glitching, low frame rate on video, high cpu load with memory leaking on chrome when video acceleration was on, had to disable it to be fine.

Software, everything using video acceleration : Chrome, Steam Store Video, Discord, VLC and other Media Player.

Downgrading many package related to video drivers seem to fix it.

Looks like a problem with 460.39 support of older GPUs. It should be reported separately with this information and last working driver version.

This thread is about a different problem – one that seems to affect all GPUs and causes a kernel panic, is present in 460.39, and might be fixed in 460.56.

Two days in using driver 460.56 and so far no crashes. Won’t count my chickens just yet but it’s looking promising.

1 Like

fifth day of using new fixed driver. still no crashes while pc enabled almost all day. The bug is fixed I suppose.
Manjaro, 5.10.18, Nvidia 460.56

I’m jealous. I’ve been reporting nvkms crashdumps in the ‘stable’ driver for over a -year- and you guys got NVidia to fix the issue in less than 5 months!

@amrits @aplattner

1 Like

I had the second crash since I have started using the newest drivers (460.56/5.11.1-arch1-1).

The crash has happened when I was away from the keyboard for around 12 minutes.

This time I was able to connect through ssh and collect logs using nvidia-bug-report.sh --safe-mode --extra-system-data

nvidia-bug-report.sh --safe-mode --extra-system-data

nvidia-bug-report.log.gz (91.9 KB)

Just to follow up my previous post I have been running the older problematic driver 460.39 with 5.10 kernel compiled with preemption disabled and not had the crash once with almost 7 days uptime, doing the same activity as was causing crash every day.

So I would say from my albeit limited testing you guys were quite probably correct here.

Going to try the latest driver now with my normal kernel config with preemption, looks good so far based on lack of reports here so far, so hopefully they fixed it this time.

I took peak at your bug report and I don’t think that is the same problem, at least the log looks different than all the others from this thread.

I also guess that kamiox’s new crashes is a different bug, introduced in 460.56:

The second crash might be a different bug, but my first crash on recent drivers was very similar to those previously reported. It has happened when I was running VLC in a full-screen mode, unfortunately, the system crashed completely so I was unable to get any logs from the machine.

I filed internal bug 3268472 for this new crash.

1 Like

Now been running the new drivers for around a week without any crashes. :)

So far no problems since February 25 when I have installed 460.56.

I had downgraded CUDA to 10.2.89 before, so I could downgrade the driver to 440.100. Now I will try to upgrade CUDA again to 11.2.0, and see if it will cause any trouble.

Installed 460.56-1 on 02/25 and haven’t had issues since. Installing 460.56-2 on 03/08 and seeing how that will go.

Only half a year to fix a critical bug, and years in the making for proper Wayland support, maybe Linus was wrong after all. Thanks nvidia!

I’m running the 460.56 but recompiled my kernel with CONFIG_PREEMPT=n, so far no problems, can have latest CUDA and all the fancy stuff. Thanks @generix.

I have the same with CONFIG_PREEMPT in kernel 5.4.97, and it seems to work so far. CUDA 11.2.0 and 11.2.1 compilation crashes, however 11.1.1 builds and works.