Linux 6.7.3 + 545.29.06/550.40.07: ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'

Well, I’m confused. Normally when there’s an error building NVIDIA modules, someone has already posted info or/and patches to fix it and this time around there’s none:

  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/nvidia.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/nvidia-modeset.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/nvidia-uvm.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/nvidia-peermem.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/nvidia-drm.o
  MODPOST /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
make[3]: *** [scripts/Makefile.modpost:145: /tmp/1/NVIDIA-Linux-x86_64-550.40.07/kernel/Module.symvers] Error 1
make[2]: *** [/tmp/linux-6.7/Makefile:1863: modpost] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/tmp/linux-6.7'
make: *** [Makefile:85: modules] Error 2

Just in case here’s my .config:

config.zip (31.0 KB)

1 Like

Since I’m a simple man and I have no effing clue how to fix this and I have my own kernel, I’ve simply patched it so that I could compile the NVIDIA driver:

diff --git a/linux-6.7/kernel/rcu/tree_plugin.h b/linux-6.7-nvidia-550.40.07/kernel/rcu/tree_plugin.h
index 4102108..72474d8 100644
--- a/linux-6.7/kernel/rcu/tree_plugin.h
+++ b/linux-6.7-nvidia-550.40.07/kernel/rcu/tree_plugin.h
@@ -406,7 +406,7 @@ void __rcu_read_lock(void)
 		WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
 	barrier();  /* critical section after entry code. */
 }
-EXPORT_SYMBOL_GPL(__rcu_read_lock);
+EXPORT_SYMBOL(__rcu_read_lock);
 
 /*
  * Preemptible RCU implementation for rcu_read_unlock().
@@ -431,7 +431,7 @@ void __rcu_read_unlock(void)
 		WARN_ON_ONCE(rrln < 0 || rrln > RCU_NEST_PMAX);
 	}
 }
-EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+EXPORT_SYMBOL(__rcu_read_unlock);
 
 /*
  * Advance a ->blkd_tasks-list pointer to the next entry, instead

The previous stable NVIDIA driver produces a metric ton of errors during compilation. Am I the only person running Linux 6.7? It’s been out for almost a month now. Doesn’t seem likely.

2 Likes

There are many users on kernel 6.7 (arch,fedora, liquorix), no issues known to me so far. Furthermore, __rcu_read_unlock, __rcu_read_lock have been gpl exports before. Also, 535 and 545 should compile on 6.7.

I see that

modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'

only on Kernel 6.8.rc2.
So i can confirm, for me 535, 545 and 550 drivers do compile against 6.7 kernel.

I am getting the same error with 545.29.06:

  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia-uvm.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia-modeset.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia-drm.o
  LD [M]  /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia-peermem.o
  MODPOST /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
make[3]: *** [scripts/Makefile.modpost:145: /tmp/1/NVIDIA-Linux-x86_64-545.29.06/kernel/Module.symvers] Error 1
make[2]: *** [/tmp/linux-6.7/Makefile:1863: modpost] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/tmp/linux-6.7'
make: *** [Makefile:82: modules] Error 2

I’m not hallucinating and I’m using the vanilla 6.7.3 kernel.

strings /NVIDIA-Linux-x86_64-545.29.06/kernel/nvidia.o | grep rcu_read_lock
__rcu_read_lock

The driver absolutely uses this symbol and it’s marked GPL only in 6.7.3. Again, either everyone is lucky or they have patched the kernel.

Ok, then it’s the same change, which is introduced in 6.8-rcX

Until 6.7.2 there was no problem at all.

Taken from my 5.15.148 kernel:

EXPORT_SYMBOL_GPL(__rcu_read_unlock);

Both __rcu_read_lock and __rcu_read_unlock symbols are GPL only already in Linux 6.7.

No idea why I’m seemingly the only affected person.

I have some debug options enabled but nothing extraordinary:

# grep CONFIG_DEBUG .config | grep -v "not set"
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO_NONE=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_DEBUG_WX=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y

Wait a “minute”, at the moment i build 6.7.3 so i can check, if i affected too.

AFAIK, DEBUG_KERNEL can lead to unwanted consequences with the nvidia driver, maybe yours is one of those.

Fedora has all these debug options enabled sans CONFIG_DEBUG_INFO_NONE which is disabled.

I want a faster PC that’s why I have it enabled.

And some of these options are not related to debugging at all.

I can confirm, update from 6.7.2 => 6.7.3 nvidia kernel fails to build.
No change on the config between 6.7.2 and 6.7.3.

1 Like

I suspect this kernel commit is the issue, it adds rcu_read_unlock/lock into the static inline pfn_valid so it ends up in the nvidia driver.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/include/linux/mmzone.h?h=v6.7.3&id=3a01daace71b521563c38bbbf874e14c3e58adb7
Now the fun starts.

1 Like

So a kernel bug? Is that confirmed that Kernel 6.7.3 is like that? I have 6.7.2 so far and if so, will prevent from upgrading

Not a bug.

Are there any patch to enable building with 6.7.3 or should we switch to open kernel driver.

this worked for me for both 6.7.3 and 6.8.0-rc2

I’m also having this problem on NixOS as of an update this morning.

1 Like

For those that do not want to wait for NVIDIA’s fixed releases, I wrote a quick workaround patch for Gentoo that does not need to modify the kernel.

Not very tested but not seeing how it could cause problems (well, it lacks the kernel’s race fix but should be no worse than it was before and that may not have affected nvidia either way).

Patch was based on 470 but can be applied up to 550 with fuzz 1. Legacy (unsupported) 390 should be fine without this for most given pfn_valid() is only used on ppc64 there.

6 Likes

Looks like they backported it to 6.6.15 as well.