Bug report: 455.23.04 - Kernel Panic due to NULL pointer dereference

This is not a nVidia issue.
This is not a nVidia issue.
This is not a nVidia issue.

a confluence of displayport,xhci, renesas, browser, config, integration issues

This isnt about fixing your specific issue rather this entire thread.

I’ve just read every post, bug report and log extract on this thread.
This is a super easy fix.
Firstly to clarify;
Linux is not supposed to work out of box.
Thats a Closed Source Market Standard.
The Open Source End User is “FREE”;
to “finish” the Open Source product to a Closed Source Market definition of “state of finish”

So the near total majority of posts are:

  • nVidia Driver 455.
  • Arch Linux,
  • Kernel 5.8+ to 5.9.1 Lowlatency / Pre-emptive
  • Intel LGA 1155, 1151, 1151v2, 1150
  • kernel NULL pointer dereference, address: 0000000000000020
  • kernel NULL pointer dereference, address: 0000000000000027
  • Chromium and Chromium based browsers.Chrome, Opera, Falkon
  • Firefox

“Extract from Chromium ArchLinux Wiki”

Hardware video acceleration

  • There is no official support from Chromium or Arch Linux for this feature (VaAPI), but you may ask for help in the dedicated forum thread.

  • chromium from official repositories is compiled with VA-API support.

  • For proprietary NVIDIA support, installing libva-vdpau-driver-chromiumAUR or libva-vdpau-driver-vp9-gitAUR is required.

  • Wayland is not supported.

  • To use VA-API on XWayland, use the --use-gl=egl flag. Currently exhibits choppiness FS#67035. It could be solved by enabling #Native Wayland support.

  • To use VA-API on Xorg, use the --use-gl=desktop flag.

  • Starting in Chromium 86, there will be support for VA-API when using the ANGLE gl renderer. Use the --enable-accelerated-video-decode to enable it on an Intel GPU."

BTW, Hows ARCH working out for ya!?

If the your system isn’t configured and integrated as per / The Book and the above Browsers aren’t dury rigged with workarounds then this WILL exponentially exacerbate and exploit the poor system integration, configuration coupled with the lack of support in the kernel or other such issues.

Correct Bios Settings are critical.
Correct Kernel parameters are critical

I also saw multiple posts using PowerSave aswell in some form.
This affects the nVidia driver aswell. It wants to ramp up and is getting choked.
Disbale all power management for PCI express.

The biggest issue is BIOS DMA Buffer/ VM / IOMMU and xHCI settings and support.
Is xHCI handover still enabled in the BIOS?
USB 2/ PS/2 Legacy support uses VM is the BIOS.

Kernel 5.8
In Arch Linux and Manjaro 5.8+ kernel has issues with Renesas USB controllers due to a FW version check issue.

Kernel EDID patch: 20201203
Removed the Skylake/Kabylake platform detection logic and makes the edid function work on all platforms. Regardless, with the patch, a kernel oops occurs on the function intel_vgpu_reg_rw_edid in drivers/drm/i915/kvmgt.c.

outtatime

So … the bug began on nvidia driver 450, not only Arch Linux, kernel 5.4 and 4.19 are also hit and the crash happened without any browser in some cases (vlc for example).

For the Hardware video acceleration, without this feature, the crash happened too (yes, I tried again and again and now I just compile the 435 for my kernel and I don’t have any crash).

I don’t use the PowerSave (because of nvidia, this problem is here for few years now), the problem with the firmware version is solved for few months (and to clarify, previous kernel have the same bug with nvidia driver).

Why this bug don’t hit the nvidia’s driver before 450 and why without chrome or whatever with the Hardware acceleration, this bug still happened ?

My questions are not really questions, It’s just to compare with the post above @abelits

ps : thanks for all instructive links

1 Like

Please stop. The kernel crash on dereferencing a NULL pointer in a driver’s function is probably the most conclusive and unambiguous indication that a bug is in the driver.

1 Like

Most likely because it was introduced in that version.

Because “not using hardware acceleration” option in one userspace program does not reliably prevent any particular piece of driver’s functionality from being used, especially in a modern desktop that uses compositing for everything. Also the problem is probably in the implementation of some basic functionality, most likely a race condition in something very common. The number of calls may affect the likeliness of a crash, however it can’t be eliminated entirely.

If a guess made by @generix is correct, and preemption is either the necessary condition or it greatly increases the probability of a crash being triggered, it would strongly indicate a race condition.

Following recent posts here I have been testing today running kernel 5.10.18 compiled with CONFIG_PREEMPT_NONE=y set, otherwise default config, and the latest 460.39 driver.

So far I have been unable to reproduce the crash whilst watching video in Kodi (For me always the trigger of crash) in around 6 hours.

But of course it is not absolutely reliable to reproduce in such a time frame, having said that previously I could not get past 3/4 days uptime whilst using kodi each day before getting the crash. So I will see how it goes and report back if I run into it in the coming days.

We have fix available for similar kind of issue and its fix in our latest release which is available to download on below link.
https://www.nvidia.co.in/Download/driverResults.aspx/170813/en-in

Please test with the above driver and share the feedback.

4 Likes

We have fix available for similar kind of issue

What issue do you refer to as “similar”?

What in particular was done to fix it?