Linux 4.7-rc7 - 367.35 build errors

First this commit https://goo.gl/KFpQjU adds a function with the same name as an Nvidia function in nvidia-uvm/uvm_linux.h

Since the commit to the kernel also contained the only reference to the function, easiest workaround I thought of was just to change the name of the kernel function. Might be possible to do the same in the nvidia source though it might be referenced in their blob. Thought it best to wait for them to rename.

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index cb4b7e8..12bce65 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -124,7 +124,7 @@ do {                                                                        \
        (root)->rnode = NULL;                                           \
 } while (0)
 
-static inline bool radix_tree_empty(struct radix_tree_root *root)
+static inline bool radix_tree_is_empty(struct radix_tree_root *root)
 {
        return root->rnode == NULL;
 }
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8798b6c..6d44f92 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -139,7 +139,7 @@ void irq_domain_remove(struct irq_domain *domain)
 {
        mutex_lock(&irq_domain_mutex);
 
-       WARN_ON(!radix_tree_empty(&domain->revmap_tree));
+       WARN_ON(!radix_tree_is_empty(&domain->revmap_tree));
 
        list_del(&domain->link);

Second, this commit https://goo.gl/Bi2QtP removed a parameter from drm_gem_object_lookup() definition. Remove the parameter from the 2 references

diff --git a/nvidia-drm/nvidia-drm-fb.c b/nvidia-drm/nvidia-drm-fb.c
index dccad84..39d9090 100644
--- a/nvidia-drm/nvidia-drm-fb.c
+++ b/nvidia-drm/nvidia-drm-fb.c
@@ -114,7 +114,7 @@ static struct drm_framebuffer *internal_framebuffer_create
      * We don't support any planar format, pick up first buffer only.
      */
 
-    gem = drm_gem_object_lookup(dev, file, cmd->handles[0]);
+    gem = drm_gem_object_lookup(file, cmd->handles[0]);
 
     if (gem == NULL)
     {
diff --git a/nvidia-drm/nvidia-drm-gem.c b/nvidia-drm/nvidia-drm-gem.c
index 6e265ce..2873ca4 100644
--- a/nvidia-drm/nvidia-drm-gem.c
+++ b/nvidia-drm/nvidia-drm-gem.c
@@ -408,7 +408,7 @@ int nvidia_drm_dumb_map_offset
 
     mutex_lock(&dev->struct_mutex);
 
-    gem = drm_gem_object_lookup(dev, file, handle);
+    gem = drm_gem_object_lookup(file, handle);
 
     if (gem == NULL)
     {

How about just using the 4.7 kernel’s radix_tree_empty function (when available)?

Also, hey NVIDIA, how about a (beta?) release to address these issues?

Both issues still occur with 367.35 and linux 4.7.0-rc7

@slyrus
I never tried, I didn’t look too much into but from what I remember the implementation was different.

They usually don’t address issues when they occur in linux -rc releases. Seems like they develop against stable release.

This patch has worked for me for last month of rc/nvidia releases:

diff -ur NVIDIA-Linux-x86_64-367.27/kernel/nvidia-drm/nvidia-drm-fb.c b/kernel/nvidia-drm/nvidia-drm-fb.c
--- NVIDIA-Linux-x86_64-367.27/kernel/nvidia-drm/nvidia-drm-fb.c	2016-06-10 02:38:43.000000000 +0200
+++ b/kernel/nvidia-drm/nvidia-drm-fb.c	2016-06-14 02:45:44.263506669 +0200
@@ -114,7 +114,7 @@
      * We don't support any planar format, pick up first buffer only.
      */
 
-    gem = drm_gem_object_lookup(dev, file, cmd->handles[0]);
+    gem = drm_gem_object_lookup(file, cmd->handles[0]);
 
     if (gem == NULL)
     {
diff -ur NVIDIA-Linux-x86_64-367.27/kernel/nvidia-drm/nvidia-drm-gem.c b/kernel/nvidia-drm/nvidia-drm-gem.c
--- NVIDIA-Linux-x86_64-367.27/kernel/nvidia-drm/nvidia-drm-gem.c	2016-06-10 02:38:43.000000000 +0200
+++ b/kernel/nvidia-drm/nvidia-drm-gem.c	2016-06-14 02:45:44.263506669 +0200
@@ -408,7 +408,7 @@
 
     mutex_lock(&dev->struct_mutex);
 
-    gem = drm_gem_object_lookup(dev, file, handle);
+    gem = drm_gem_object_lookup(file, handle);
 
     if (gem == NULL)
     {
diff -ur NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm_linux.h b/kernel/nvidia-uvm/uvm_linux.h
--- NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm_linux.h	2016-06-10 02:37:08.000000000 +0200
+++ b/kernel/nvidia-uvm/uvm_linux.h	2016-06-14 02:49:35.214495381 +0200
@@ -563,12 +563,13 @@
     INIT_RADIX_TREE(tree, GFP_NOWAIT);
 }
 
+/*
 static bool radix_tree_empty(struct radix_tree_root *tree)
 {
     void *dummy;
     return radix_tree_gang_lookup(tree, &dummy, 0, 1) == 0;
 }
-
+*/
 
 #if !defined(NV_USLEEP_RANGE_PRESENT)
 static void __sched usleep_range(unsigned long min, unsigned long max)

I bumped into the same thing last night on my arch box. Thanks for the additional digs into the kernel commits to find the reason. I posted a similar patch via pastebin to the main forums.geforce.com boards then and was kindly re-directed here.

My very similar patch (that “just uses” the kernel-provided function) is at http://pastebin.com/ux67Mkmr (valid until 2016.09.20) and my result seems to “just work”.

I agree that this is something NVIDIA will need to include in their package, at least if it detects that it’s being installed on a 4.7-or-later kernel. This is a show-stopper bug, else. It doesn’t have a lot of visibility today but as more and more distros roll over to 4.7 it’ll sure become common.

cheers…ank

Thanks for the patches. The 4.7 kernel hit fedora 24 so this is now a problem for a lot more people.

Please test with upcoming r370 driver and update.

Ran into this issue as well. I upgraded to driver 370.38 on Fedora 24 with kernel 4.7.3-200, appears to be working really well.

This is also a problem when trying to use the 364 drivers with the 4.8 kernel pushed to the Ubuntu 16.10 testers today.

I can’t use the newer 370 drivers because of hard hangs with anything after v364: https://devtalk.nvidia.com/default/topic/937319/linux/367-370-xx-980m-w-4k-screen-lock-up-at-boot-ubuntu-16-10-/. So, I’m kind of boned here unless I want to switch to nouveau or run an old or custom kernel on my test system.