I doubt it’s in there. No mention in the changelog and seeing that they gave us this patch (which I haven’t tried yet) it’ll probably be a long time until they fix it in a future release.
On your question about release timeline, see their previous answer here:
This was something they had given a release timeline on already (“mid November”), but I hadn’t considered “5.9 compatible” would include this bug. Since it’s now a security issue too I figured they might given an updated schedule on e.g. 450 series 5.9 compatibility or clarify their position on the severity.
How did you accomplish the downgrade? I’m on Fedora 32 also and I can’t figure out how to downgrade to 450. There isn’t a package for it in the rpmfusion repository. Did you use the official version from nvidia instead?
Yes, I downloaded the official version from NVIDIA. After having had issues with the various repositories from time to time, I’ve used this excellent blog article as reference: https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/
and been using the official version from NVIDIA directly for years now.
BTW, the patch process to 455 mentioned by @aplattner on 11 November has worked marvellously!! I have not had a crash since 12 November when I last rebooted.
Thank you for your help, this worked.
I originally wanted to try exactly this, but I thought that blacklisting nouveau would result in a black screen if no nvidia driver is installed.
I followed this guide for blacklisting https://wiki.archlinux.org/index.php/Kernel_module#Blacklisting
posted patch seems to be working great for me on arch linux, i have been using it without any problems for a week or so. i havent tried if the patch works on newer driver versions, but if you don’t want your system to undo it every driver update you should blacklist the nvidia package from being updated. on arch / manjaro you can do this by uncommenting the line ignorepkg and adding nvidia(or nvidia-dkms depending on which one you installed), nvidia-utils, nvidia-settings and lib32-nvidia-utils to it in the pacman config file(which is located at /etc/pacman.conf)
root@host:/usr/local/src/nvidia# bash NVIDIA-Linux-x86_64-455.38.run --apply-patch bsingharora.patch
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 455.38..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- nvidia-modeset/nvidia-modeset-linux.c.org 2020-11-23 20:46:12.817979880 +1100
|+++ nvidia-modeset/nvidia-modeset-linux.c 2020-11-24 10:50:31.474395155 +1100
--------------------------
File to patch:
your patch header does not look like the one provided by aplattner
see the difference to modify it to apply like nvidia patch:
I am pleased to report that I successfully applied the patch from @aplattner to version 455.45.01 and have not encountered a random display failure after almost 4 days of uptime (was previously getting failures every 1-2 days).
Not sure if the issue I’ve been seeing on my computer (GTX 1070ti) is related, but I can reproduce a video lockup by starting a VR session on my computer via Steam, exiting, and starting another one right after. Video just locks up at that point.
Or start a VR session after the computer was on for a while.
But that first one is the usual repro case for me.
Some good news for Arch Linux users. Thanks to the efforts for the Frogging-Family/ nvidia-all some of the patches from this thread here were included into the kernel package building PKGBUILD config. You can find all the info needed at the linked github page.
Apparently, on occasion, the hardlocks happen on the first attempt at launching SteamVR. Meaning my repro case is “reliable” only if/when it doesn’t crash in the first place.
It seems that the issues with my hard-hang are mitigated by disabling KDE/kwin composition altogether. Meaning either there are oddities in the driver at context setup if/when a certain “type” of context already exists, or KWin puts the driver in an odd position if/when certain composition options are active.
Yeah seriously. Where is the final fix? This is a MAJOR problem for me. How has this NOT been addressed officially yet? WTF? Thank the good lord there are some folks here nice enough to offer a patch. But why no NVidia folks? Really disappointed in their lack of response here.
Another random user chiming in here to report that this seems to have finally fixed the problem for me as well - 48+ hours since I purged the Ubuntu nvidia-drivers-455 packages and reinstalled the NVIDIA-Linux-x86_64-455.45.01.run package with aplattner’s patch applied and no re-occurrence of the fault.
In case the info is helpful to anyone I’ve been experiencing exactly the same (GFP_KERNEL|__GFP_COMP) kernel fault but with noticeably different results than most report in this thread. Furthermore I’ve been having this error for much longer - 2 to 3 months I’d guess. Initially it was only mildly annoying as it would restart my DM after 5 seconds of being frozen out with no further consequences - about 3 weeks ago the symptoms suddenly got worse and the desktop would freeze for 60 seconds or so before restarting and trigger further segmentation faults in the DM. After that had happened it wouldn’t be long before the fault would reoccur again meaning a lengthy reboot became mandatory after every single crash and like everyone else, I was getting them entirely randomly: from during login itself up to 24 hours or so later.
The fault isn’t random - it’s to do with memory pressure as far as I can tell. With the system under heavy load it definitely triggered the fault more easily. Most easily triggered on my ZFS based system when the ARC was full and staging large amounts of files into L2ARC: actual GPU load is as good as zero on my system normally.
Many thanks for the helpful thread contributions that let me piece together the fix.