Linux 4.9-git(rc1) - 375.10 build errors

  1. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/linux/mm.h?id=768ae309a96103ed02eb1e111e838c87854d8b51 - This commit replaces get_user_pages() write/force parameters with gup_flags, but nvidia driver uses old parameters. That causes a lot of build errors.

  2. Build fails with error in nvidia-drm-mmap.c caused by filp type in drm_vma_node_is_allowed (drm_file is expected, but file is used). I don’t know what commit in kernel is causing an error (maybe https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/drm/drm_vma_manager.h?id=d9a1f0b4eb6080dc42bb6373ab9abb0314cea41e?).

I’ve made patch to fix those two issues:

# Issue №1
diff -Naur kernel/common/inc/nv-mm.h b/kernel/common/inc/nv-mm.h
--- kernel/common/inc/nv-mm.h	2016-10-22 13:42:48.220791689 +0300
+++ b/kernel/common/inc/nv-mm.h	2016-10-22 13:51:23.906406480 +0300
@@ -45,8 +45,8 @@
     #define NV_GET_USER_PAGES           get_user_pages
     #define NV_GET_USER_PAGES_REMOTE    get_user_pages_remote
 #else
-    #define NV_GET_USER_PAGES(start, nr_pages, write, force, pages, vmas) \
-        get_user_pages(current, current->mm, start, nr_pages, write, force, pages, vmas)
+    #define NV_GET_USER_PAGES(start, nr_pages, gup_flags, pages, vmas) \
+        get_user_pages(current, current->mm, start, nr_pages, gup_flags, pages, vmas)
 
     #define NV_GET_USER_PAGES_REMOTE    get_user_pages
 #endif
diff -Naur kernel/nvidia/os-mlock.c b/kernel/nvidia/os-mlock.c
--- kernel/nvidia/os-mlock.c	2016-10-22 13:42:48.222791633 +0300
+++ b/kernel/nvidia/os-mlock.c	2016-10-22 13:51:23.908406424 +0300
@@ -117,7 +117,7 @@
 
     down_read(&mm->mmap_sem);
     ret = NV_GET_USER_PAGES((unsigned long)address,
-                            page_count, write, force, user_pages, NULL);
+                            page_count, write ? FOLL_WRITE : 0, user_pages, NULL);
     up_read(&mm->mmap_sem);
     pinned = ret;
 
diff -Naur kernel/nvidia-drm/nvidia-drm-linux.c b/kernel/nvidia-drm/nvidia-drm-linux.c
--- kernel/nvidia-drm/nvidia-drm-linux.c	2016-10-22 13:42:48.243791048 +0300
+++ b/kernel/nvidia-drm/nvidia-drm-linux.c	2016-10-22 13:51:23.931405780 +0300
@@ -137,7 +137,7 @@
 
     down_read(&mm->mmap_sem);
 
-    pages_pinned = NV_GET_USER_PAGES(address, pages_count, write, force,
+    pages_pinned = NV_GET_USER_PAGES(address, pages_count, write ? FOLL_WRITE :0,
                                      user_pages, NULL);
     up_read(&mm->mmap_sem);
  
diff -Naur kernel/nvidia-uvm/uvm8_tools.c b/kernel/nvidia-uvm/uvm8_tools.c
--- kernel/nvidia-uvm/uvm8_tools.c	2016-10-22 13:42:48.251790826 +0300
+++ b/kernel/nvidia-uvm/uvm8_tools.c	2016-10-22 13:51:23.938405584 +0300
@@ -224,7 +224,7 @@
     }
 
     down_read(&current->mm->mmap_sem);
-    ret = NV_GET_USER_PAGES(user_va, num_pages, 1, 0, *pages, vmas);
+    ret = NV_GET_USER_PAGES(user_va, num_pages, 1 ? FOLL_WRITE : 0, *pages, vmas);
     up_read(&current->mm->mmap_sem);
     if (ret != num_pages) {
         status = NV_ERR_INVALID_ARGUMENT;

# Issue №2
diff -Naur kernel/nvidia-drm/nvidia-drm-mmap.c b/kernel/nvidia-drm/nvidia-drm-mmap.c
--- kernel/nvidia-drm/nvidia-drm-mmap.c	2016-10-22 13:42:48.243791048 +0300
+++ b/kernel/nvidia-drm/nvidia-drm-mmap.c	2016-10-22 13:51:23.931405780 +0300
@@ -113,7 +113,7 @@
 
     /* Check the caller has been granted access to the buffer object */
 
-    if (!drm_vma_node_is_allowed(&gem->vma_node, filp))
+    if (!drm_vma_node_is_allowed(&gem->vma_node, (struct drm_file*)filp))
     {
         ret = -EACCES;

I can confirm that patched 375.10 driver works like a charm on my Fedora 25 with 4.9.0-0.rc1.git3.2 kernel.

Did you downgrade Xorg server?

Yeah, sure. Xorg API 24 is not compatible with nvidia for now.

many thanks for the Patches, they also works with Kernel 4.9-rc3.

http://www.computer-retro.de/Bilder/nvidia-375-10-with-49-rc3.png

If anyone is on drm-next or drm-intel-nightly, then this issue will pop up, making nvidia-drivers not compile because of missing <linux/fence.h>

https://cgit.freedesktop.org/drm-intel/commit/include/linux/fence.h?id=f54d1867005c3323f5d8ad83eed823e84226c429

And also this: https://cgit.freedesktop.org/drm-intel/commit/include/drm/drm_atomic.h?id=0853695c3ba46f97dfc0b5885f7b7e640ca212dd

I got the driver to compile with this by doing these steps:

The coccinelle script from the patch posting:

@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )
  1. Run that semantic patch with spatch --sp-file “/path/to/patchfile.cocci” --in-place --include-headers --dir “workdir”

  2. replace all instances of #include <linux/fence.h> with #include <linux/dma-fence.h>

  3. Remove/comment out all traces of drm_atomic_state_free (e.g. remove the nvidia_drm_atomic_state_free funcion completely from nvidia-drm-modeset.c, and remove all calls to drm_atomic_state_free in the code (search for it), and remove “.atomic_state_free = nvidia_drm_atomic_state_free,” from nvidia-drm-drv.c)

I will guess this is the wrong solution though.

For #3, it looks like drm_atomic_state_put is what you want now. Per https://github.com/torvalds/linux/commit/0853695c3ba46f97dfc0b5885f7b7e640ca212dd.

Ah, nice! Then it should be possible to just make a simple patch/coccinelle run and then it should work nicely :)

Back to running mainline kernel now, though, so not able to test it just yet…