550.78 won't compile on 6.10-rc1 due to GPL violations and removed follow_pfn()

philmmanjaro · June 1, 2024, 8:33am

Here we go again. The current stable 550.78 is not compiling against the latest release candidate of the Mainline Linux Kernel. For the open-kernel module there is a patch:

diff --git a/kernel/nvidia/os-mlock.c b/kernel/nvidia/os-mlock.c
index 46f99a1..b8f4100 100644
--- a/kernel/nvidia/os-mlock.c
+++ b/kernel/nvidia/os-mlock.c
@@ -30,11 +30,21 @@ static inline int nv_follow_pfn(struct vm_area_struct *vma,
                                 unsigned long address,
                                 unsigned long *pfn)
 {
-#if defined(NV_UNSAFE_FOLLOW_PFN_PRESENT)
-    return unsafe_follow_pfn(vma, address, pfn);
-#else
-    return follow_pfn(vma, address, pfn);
-#endif
+    int status = 0;
+    spinlock_t *ptl;
+    pte_t *ptep;
+
+    if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+        return status;
+
+    status = follow_pte(vma, address, &ptep, &ptl);
+    if (status)
+        return status;
+    *pfn = pte_pfn(ptep_get(ptep));
+
+    // The lock is acquired inside follow_pte()
+    pte_unmap_unlock(ptep, ptl);
+    return 0;
 }
 
 /*!

A discussion you may find here: `follow_pfn()` is removed from kernel · Issue #642 · NVIDIA/open-gpu-kernel-modules · GitHub

With the closed-source kernel however we will have again GPL violations, similar to 6.8 kernel series:

  MODPOST /build/linux610-nvidia/src/NVIDIA-Linux-x86_64-550.78-no-compat32/kernel/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'follow_pte'

It would be much nicer from Nvidia to make known issues public and also patches available as soon as they exist to the public, so power-user and developers can use the latest Linux kernel to debug and give feedback in time, before the stable kernel hit user-land.

djiony2011 · June 1, 2024, 6:57pm

Yes, i reproduce this today with OpenSUSE and vanilla kernel or sunlight kernel.

470 series not working.

djiony2011 · June 1, 2024, 7:05pm

For workaround, i use this:

Kernel side:

Patch:

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 340bbefe5f652..181965356d9cb 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -406,7 +406,7 @@ void __rcu_read_lock(void)
                WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
        barrier();  /* critical section after entry code. */
 }
-EXPORT_SYMBOL_GPL(__rcu_read_lock);
+EXPORT_SYMBOL(__rcu_read_lock);
 
 /*
  * Preemptible RCU implementation for rcu_read_unlock().
@@ -431,7 +431,7 @@ void __rcu_read_unlock(void)
                WARN_ON_ONCE(rrln < 0 || rrln > RCU_NEST_PMAX);
        }
 }
-EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+EXPORT_SYMBOL(__rcu_read_unlock);
 
 /*
  * Advance a ->blkd_tasks-list pointer to the next entry, instead

2.Patch:

diff --git a/mm/memory.c b/mm/memory.c
index d022c84c22080..d00f494c62f2f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6011,7 +6011,7 @@ int follow_pte(struct vm_area_struct *vma, unsigned long address,
 out:
        return -EINVAL;
 }
-EXPORT_SYMBOL_GPL(follow_pte);
+EXPORT_SYMBOL(follow_pte);
 
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 /**

NVidia Driver side:

diff --git a/kernel/nvidia/os-mlock.c b/kernel/nvidia/os-mlock.c
index 46f99a1..b8f4100 100644
--- a/kernel/nvidia/os-mlock.c
+++ b/kernel/nvidia/os-mlock.c
@@ -30,11 +30,21 @@ static inline int nv_follow_pfn(struct vm_area_struct *vma,
                                 unsigned long address,
                                 unsigned long *pfn)
 {
-#if defined(NV_UNSAFE_FOLLOW_PFN_PRESENT)
-    return unsafe_follow_pfn(vma, address, pfn);
-#else
-    return follow_pfn(vma, address, pfn);
-#endif
+    int status = 0;
+    spinlock_t *ptl;
+    pte_t *ptep;
+
+    if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+        return status;
+
+    status = follow_pte(vma, address, &ptep, &ptl);
+    if (status)
+        return status;
+    *pfn = pte_pfn(ptep_get(ptep));
+
+    // The lock is acquired inside follow_pte()
+    pte_unmap_unlock(ptep, ptl);
+    return 0;
 }
 
 /*!

And driver is started now:

sudo dmesg | grep -E "NV|vmlinuz" 
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.10.0-rc1-x64zen4+ root=UUID=a7528a1d-9d96-4917-87e7-63a3c2870243 nouveau.modeset=0 clocksource=tsc tsc=reliable splash resume=/dev/disk/by-uuid/79c14cbd-bdd5-48d9-b6ab-30d060ac0cd7 mitigations=auto quiet security=apparmor nosimplefb=1
[    0.000000] BIOS-e820: [mem 0x000000000a200000-0x000000000a20efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000ec253000-0x00000000ec54cfff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x000000000a200000-0x000000000a20efff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000ec253000-0x00000000ec54cfff] ACPI NVS
[    0.045954] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.10.0-rc1-x64zen4+ root=UUID=a7528a1d-9d96-4917-87e7-63a3c2870243 nouveau.modeset=0 clocksource=tsc tsc=reliable splash resume=/dev/disk/by-uuid/79c14cbd-bdd5-48d9-b6ab-30d060ac0cd7 mitigations=auto quiet security=apparmor nosimplefb=1
[    0.046042] Unknown kernel command line parameters "splash BOOT_IMAGE=/boot/vmlinuz-6.10.0-rc1-x64zen4+", will be passed to user space.
[    0.288565] ACPI: PM: Registering ACPI NVS region [mem 0x0a200000-0x0a20efff] (61440 bytes)
[    0.288565] ACPI: PM: Registering ACPI NVS region [mem 0xec253000-0xec54cfff] (3121152 bytes)
[    0.336886] ACPI: \_SB_.PCI0.GPP6.P0NV: New power resource
[    1.738203]     BOOT_IMAGE=/boot/vmlinuz-6.10.0-rc1-x64zen4+
[    6.154214] nvidia: module license 'NVIDIA' taints kernel.
[    6.351520] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.239.06  Sat Feb  3 06:03:07 UTC 2024
[    6.463612] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.239.06  Sat Feb  3 06:03:51 UTC 2024
[    6.736055] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input19
[    6.736135] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input20
[    6.736245] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input21
[    6.736328] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.1/0000:01:00.1/sound/card0/input22

philmmanjaro · June 2, 2024, 1:29am

Removing GPL only marks in the kernel is against its license. Let"s see what the official solution by Nvidia will be.

jrelvas · June 3, 2024, 8:37am

The GPL restriction doesn’t apply for the open kernel module, so if that works properly for you, consider using it.

SoftExpert · July 16, 2024, 9:45am

Now that kernel 6.10 is released, we are still missing a working patch.
In my case, nvidia-470.256.02 will not build the kernel modules; using the patch from the open-kernel module will result in the GPL-only symbol errors:

ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'follow_pte'

Is there any workaround that does not involve modifying the kernel sources ?

Could it be that CVE-2024-38610 has impacted the release of a patch ? (I see follow_pte mentioned multiple times)

*edited to add CVE reference

Topic		Replies	Views
GPL-only symbols 'follow_pte' and '__rcu_read_unlock' prevent 470.256.02 to build with kernel 6.10 Linux kernel	7	2202	August 4, 2024
Linux 6.7.3 + 545.29.06/550.40.07: ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock' Linux	78	41231	April 5, 2024
Linux-6.1.27 compile 3080 driver error; uses GPL-only symbol ‘lockdep_rcu_suspicious’ Linux kernel , jetson	4	66	February 22, 2025
nvidia-drivers-378.09 fails to build FATAL: modpost: GPL-incompatible module nvidia-drm.ko uses GPL-only symbol 'mutex_destroy' Linux	17	14776	August 23, 2018
Linux 3.19-rcx Fatal modpost when installing Nvidia 343.36/346.35 Linux	6	5793	February 11, 2015
LTS kernel patch for Intel CPU vulnerability breaks nvidia driver Linux	12	8099	January 9, 2018
[patch made obsolete by 381.22] 381.09 + kernel 4.12 staging Linux	7	3345	May 11, 2017
6.15 Kernel and Closed Module Compatibility in 570.153.02 Linux	4	1316	June 18, 2025
Multiple kernel oopses before suspending caused by nvidia-sleep.sh, Linux 6.10 regression? WARNING: CPU: PID: at include/linux/rwsem.h:80 follow_pte Linux	5	2612	September 14, 2024
Build fail driver 470.256.02 on kernel 6.10.10 Linux linux-driver	4	413	October 8, 2024

550.78 won't compile on 6.10-rc1 due to GPL violations and removed follow_pfn()

Related topics