455.23.04: Page allocation failure in kernel module at random points

I doubt it’s in there. No mention in the changelog and seeing that they gave us this patch (which I haven’t tried yet) it’ll probably be a long time until they fix it in a future release.

On your question about release timeline, see their previous answer here:

This was something they had given a release timeline on already (“mid November”), but I hadn’t considered “5.9 compatible” would include this bug. Since it’s now a security issue too I figured they might given an updated schedule on e.g. 450 series 5.9 compatibility or clarify their position on the severity.

How did you accomplish the downgrade? I’m on Fedora 32 also and I can’t figure out how to downgrade to 450. There isn’t a package for it in the rpmfusion repository. Did you use the official version from nvidia instead?

Yes, I downloaded the official version from NVIDIA. After having had issues with the various repositories from time to time, I’ve used this excellent blog article as reference:
https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/
and been using the official version from NVIDIA directly for years now.

BTW, the patch process to 455 mentioned by @aplattner on 11 November has worked marvellously!! I have not had a crash since 12 November when I last rebooted.

@aplattner - my promised feedback: I applied the above patch to 455.38 on 12 Nov, and have not had one crash since then. 7 days uptime! :-]

1 Like

Thank you for your help, this worked.
I originally wanted to try exactly this, but I thought that blacklisting nouveau would result in a black screen if no nvidia driver is installed.
I followed this guide for blacklisting

The patch seemed to fail on Fedora 32. The nvidia-modeset.ko.xz is included in the initramfs image. I needed to manually rebuild that:

dracut /boot/initramfs-(uname -r).img (uname -r)

Examination of the nvkms_alloc function in nvidia_modeset module with gdb disassemble in /proc/kcore now shows the expected change.

posted patch seems to be working great for me on arch linux, i have been using it without any problems for a week or so. i havent tried if the patch works on newer driver versions, but if you don’t want your system to undo it every driver update you should blacklist the nvidia package from being updated. on arch / manjaro you can do this by uncommenting the line ignorepkg and adding nvidia(or nvidia-dkms depending on which one you installed), nvidia-utils, nvidia-settings and lib32-nvidia-utils to it in the pacman config file(which is located at /etc/pacman.conf)

Thanks. I’ve been using rpmfusion until now but with this bug in the latest release it might be time to switch.

I recently upgraded to 455.45 and am seeing this problem there too. Will try downgrading to 455.38 and applying the patch described above.

I am attaching a patch here which I think is the right way to handle the BUG, if you need my signed-off-by, please reach out or just add it

--- nvidia-modeset/nvidia-modeset-linux.c.org	2020-11-23 20:46:12.817979880 +1100
+++ nvidia-modeset/nvidia-modeset-linux.c	2020-11-24 10:50:31.474395155 +1100
@@ -21,6 +21,7 @@
 #include <linux/file.h>
 #include <linux/list.h>
 #include <linux/rwsem.h>
+#include <linux/mm.h>
 
 #include "nvstatus.h"
 
@@ -169,33 +170,19 @@ static inline void nvkms_write_unlock_pm
  * are called while nvkms_lock is held.
  *************************************************************************/
 
-/* Don't use kmalloc for allocations larger than 128k */
-#define KMALLOC_LIMIT (128 * 1024)
-
+/*
+ * Let the system decide when to switch between kmalloc and vmalloc
+ */
 void* NVKMS_API_CALL nvkms_alloc(size_t size, NvBool zero)
 {
-    void *p;
-
-    if (size <= KMALLOC_LIMIT) {
-        p = kmalloc(size, GFP_KERNEL);
-    } else {
-        p = vmalloc(size);
-    }
-
-    if (zero && (p != NULL)) {
-        memset(p, 0, size);
-    }
-
-    return p;
+    if (zero)
+        return kvzalloc(size, GFP_KERNEL);
+    return kvmalloc(size, GFP_KERNEL);
 }
 
 void NVKMS_API_CALL nvkms_free(void *ptr, size_t size)
 {
-    if (size <= KMALLOC_LIMIT) {
-        kfree(ptr);
-    } else {
-        vfree(ptr);
-    }
+    return kvfree(ptr);
 }
 
 void* NVKMS_API_CALL nvkms_memset(void *ptr, NvU8 c, size_t size)
root@host:/usr/local/src/nvidia# bash NVIDIA-Linux-x86_64-455.38.run --apply-patch bsingharora.patch
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 455.38..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
can't find file to patch at input line 3
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- nvidia-modeset/nvidia-modeset-linux.c.org  2020-11-23 20:46:12.817979880 +1100
|+++ nvidia-modeset/nvidia-modeset-linux.c      2020-11-24 10:50:31.474395155 +1100
--------------------------
File to patch:

your patch header does not look like the one provided by aplattner

see the difference to modify it to apply like nvidia patch:

root@host:/usr/local/src/nvidia# head -3 reduce-kmalloc-limit-455.38.patch bsingharora.patch
==> reduce-kmalloc-limit-455.38.patch <==
diff -Naur kernel.orig/nvidia-modeset/nvidia-modeset-linux.c kernel/nvidia-modeset/nvidia-modeset-linux.c
--- kernel.orig/nvidia-modeset/nvidia-modeset-linux.c   2020-10-21 23:17:41.000000000 -0700
+++ kernel/nvidia-modeset/nvidia-modeset-linux.c        2020-11-04 10:35:44.113986369 -0800

==> bsingharora.patch <==
--- nvidia-modeset/nvidia-modeset-linux.c.org   2020-11-23 20:46:12.817979880 +1100
+++ nvidia-modeset/nvidia-modeset-linux.c       2020-11-24 10:50:31.474395155 +1100

OK, lets try once more (this time via git)

diff --git a/nvidia-modeset/nvidia-modeset-linux.c b/nvidia-modeset/nvidia-modeset-linux.c
index ffbbeb9..2302541 100644
--- a/nvidia-modeset/nvidia-modeset-linux.c
+++ b/nvidia-modeset/nvidia-modeset-linux.c
@@ -21,6 +21,8 @@
 #include <linux/file.h>
 #include <linux/list.h>
 #include <linux/rwsem.h>
+#include <linux/mm.h>
+#include <linux/version.h>
 
 #include "nvstatus.h"
 
@@ -169,8 +171,9 @@ static inline void nvkms_write_unlock_pm_lock(void)
  * are called while nvkms_lock is held.
  *************************************************************************/
 
-/* Don't use kmalloc for allocations larger than 128k */
-#define KMALLOC_LIMIT (128 * 1024)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 12, 0)
+/* Don't use kmalloc for allocations larger than PAGE_SIZE */
+#define KMALLOC_LIMIT (PAGE_SIZE)
 
 void* NVKMS_API_CALL nvkms_alloc(size_t size, NvBool zero)
 {
@@ -197,6 +200,19 @@ void NVKMS_API_CALL nvkms_free(void *ptr, size_t size)
         vfree(ptr);
     }
 }
+#else
+void* NVKMS_API_CALL nvkms_alloc(size_t size, NvBool zero)
+{
+    if (zero)
+        return kvzalloc(size, GFP_KERNEL);
+    return kvmalloc(size, GFP_KERNEL);
+}
+
+void NVKMS_API_CALL nvkms_free(void *ptr, size_t size)
+{
+    kvfree(ptr);
+}
+#endif
 
 void* NVKMS_API_CALL nvkms_memset(void *ptr, NvU8 c, size_t size)
 {