Opencl not working with kernel 5.9

With kernel 5.9 nvidia opencl is not working anymore.

example: Darktable

1# darktable -d opencl
0.065023 [opencl_init] opencl related configuration options:
0.065030 [opencl_init] 
0.065031 [opencl_init] opencl: 1
0.065032 [opencl_init] opencl_scheduling_profile: 'default'
0.065034 [opencl_init] opencl_library: ''
0.065035 [opencl_init] opencl_memory_requirement: 768
0.065036 [opencl_init] opencl_memory_headroom: 400
0.065037 [opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
0.065039 [opencl_init] opencl_mandatory_timeout: 200
0.065040 [opencl_init] opencl_size_roundup: 16
0.065041 [opencl_init] opencl_async_pixelpipe: 0
0.065042 [opencl_init] opencl_synch_cache: active module
0.065043 [opencl_init] opencl_number_event_handles: 25
0.065044 [opencl_init] opencl_micro_nap: 1000
0.065045 [opencl_init] opencl_use_pinned_memory: 0
0.065048 [opencl_init] opencl_use_cpu_devices: 0
0.065050 [opencl_init] opencl_avoid_atomics: 0
0.065052 [opencl_init] 
0.065183 [opencl_init] found opencl runtime library 'libOpenCL'
0.065194 [opencl_init] opencl library 'libOpenCL' found on your system and loaded
0.118187 [opencl_init] could not get platforms: -1001
0.118197 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
0.118198 [opencl_init] initial status of opencl enabled flag is OFF.

In the journal I see the following messages when darktable tries to use opencl:

nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2)
nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2)
nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2)

Is this related to https://www.phoronix.com/scan.php?page=news_item&px=Linux-59-Proprietary-Shim-Taint
?

Yes, this is because of the “GPL condom” that prevents loading the UVM module…

BOTH Linux kernel developers and NVIDIA are to blame here.

The first because they behave like OpenSource nazies (and are in the process violating laws such as UE law which does NOT allow API copyrighting): imagine what would happen to OpenSource if Micro$oft or Apple would suddenly behave in the exact same way and forbid to use their OS’ APIs unless the software using them would be using the same license… OpenSource software would simply not run any more on Windows PCs and Macs and would be doomed !

The second, because their OpenSource support and production is extremely poor (to say the least) without even a valid reason (why would the competition have better contributions if there was any valid reason to hide their software behind closed sources doors), and because instead of discussing the issue like civilized persons with Linux kernel devels, they just try and bypass the existing restrictions rather than to consider taking the Open Source route (at least for part of their code: no one would blame them for keeping some code closed as a binary blob: almost every other hardware manufacturer, including AMD, are doing it).

As an end user, it really pisses me off big time.

So here is my advice, until those “fine people” pertaining to the Linux kernel and NVIDIA devel teams finally find a common, sane ground (i.e. behave as adults instead of like children in a kindergarten), just do like what I did to circumvent this issue:
1.- turn the license in the NVIDIA kernel sources from “NVIDIA” TO “GPL”
2.- disable (just couple #if 0 … #endif to insert in kernel/module.c) the GPL condom code (compare v5.8 and v5.9 kernel sources to find out where to insert those directives).
Any of the above two solutions will allow you to recover the full functionality of the NVIDIA drivers on your system, and as long as you do not distribute the resulting binaries, no one can blame (or sue you) you for it !

Thanks for the tips on how to circumvent.

But could you please elaborate on method #1. Where do I find the file or files that contain the license information?

Grep for “MODULE_LICENSE” in the kernel directory after running the installer with --extract-only.

I can’t thank you enough.

I found the files, made a patch for Gentoo, and built the drivers. It all functions normally.

The kerne developers should include some kind of “opt out” for users that just want the nvidia driver and do not distribute the product.

Based on what @dinosaur said, I created my own kernel patch to make nvidia drivers work again. The patch reverts the commit which introduced the TAINT stuff:

This is my patch:

--- linux-5.9/kernel/module.c.old	2020-10-14 06:51:57.598066293 +0200
+++ linux-5.9/kernel/module.c	2020-10-14 07:58:16.504570606 +0200
@@ -1431,6 +1431,7 @@
 	return 0;
 }
 
+#if 0
 static bool inherit_taint(struct module *mod, struct module *owner)
 {
 	if (!owner || !test_bit(TAINT_PROPRIETARY_MODULE, &owner->taints))
@@ -1449,6 +1450,7 @@
 	}
 	return true;
 }
+#endif
 
 /* Resolve a symbol for this module.  I.e. if we find one, record usage. */
 static const struct kernel_symbol *resolve_symbol(struct module *mod,
@@ -1474,6 +1476,7 @@
 	if (!sym)
 		goto unlock;
 
+#if 0
 	if (license == GPL_ONLY)
 		mod->using_gplonly_symbols = true;
 
@@ -1481,6 +1484,7 @@
 		sym = NULL;
 		goto getname;
 	}
+#endif
 
 	if (!check_version(info, name, mod, crc)) {
 		sym = ERR_PTR(-EINVAL);

It works just fine!

1 Like

Is this for drivers to patch or for kernel?

Is this question for me?

Did I write "I created my own kernel patch "?

I assume that it is illegal to patch the nvidia drivers to claim they are GPL. As a user you can do that, but not as a distro provider.

Patch works, but with latest linux git pull (commit 071a0578b0ce0b0e543d1e38ee6926b9cc21c198), compile fails …

/v6/src/nvidia-455.28/nvidia/nv-dma.c:631:37: error: implicit declaration of function ‘get_dma_ops’; did you mean ‘get_mm_rss’? [-Werror=implicit-function-declaration]
const struct dma_map_ops *ops = get_dma_ops(dma_dev->dev);
^~~~~~~~~~~
get_mm_rss
/v6/src/nvidia-455.28/nvidia/nv-dma.c:631:37: warning: initialization of ‘const struct dma_map_ops *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
/v6/src/nvidia-455.28/nvidia/nv-dma.c:646:16: error: dereferencing pointer to incomplete type ‘const struct dma_map_ops’
return (ops->map_resource != NULL);
^~
/v6/src/nvidia-455.28/nvidia/nv-dma.c:650:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^

I think that erorr can be fixed with this …

— nvidia-455.28/nvidia/nv-dma.c 2020-10-17 14:36:50.215295676 +0800
+++ nvidia-455.28/nvidia/nv-dma.c 2020-10-17 14:38:42.075296625 +0800
@@ -627,26 +627,7 @@
nv_dma_device_t *dma_dev
)
{
-#if defined(NV_DMA_MAP_RESOURCE_PRESENT)

  • const struct dma_map_ops *ops = get_dma_ops(dma_dev->dev);
  • if (ops == NULL)
  • {
  •    /* On pre-5.0 kernels, if dma_map_resource() is present, then we
    
  •     * assume that ops != NULL.  With direct_dma handling swiotlb on 5.0+
    
  •     * kernels, ops == NULL.
    
  •     */
    

-#if defined(NV_DMA_IS_DIRECT_PRESENT)
return NV_TRUE;
-#else

  •    return NV_FALSE;
    

-#endif

  • }
  • return (ops->map_resource != NULL);
    -#else
  • return NV_FALSE;
    -#endif
    }

/* DMA-map a peer PCI device’s BAR for peer access. */

But the following “dev” … I don’t know to fix that …

/v6/src/nvidia-455.28/nvidia-drm/nvidia-drm-gem-user-memory.c:63:12: error: too few arguments to function ‘drm_prime_pages_to_sg’
return drm_prime_pages_to_sg(nv_user_memory->pages,
^~~~~~~~~~~~~~~~~~~~~
In file included from /v6/src/nvidia-455.28/nvidia-drm/nvidia-drm-gem-user-memory.c:28:
./include/drm/drm_prime.h:91:18: note: declared here
struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
^~~~~~~~~~~~~~~~~~~~~

Looks like need to pass addition “dev” to the function drm_prime_pages_to_sg() …

Jeff

Hi jeff,

Thanks for your effort. But commit 071a0578b0ce0b0e543d1e38ee6926b9cc21c198 is the head of kernel 5.10-rc. Not 5.9.x. (Look at the log in git. afaik Linus Torvalds always develops the next kernel.) I guess it’s too early…but anyway your post will help in the near future!!