Opencl not working with kernel 5.9

mbod · October 13, 2020, 8:40am

With kernel 5.9 nvidia opencl is not working anymore.

example: Darktable

1# darktable -d opencl
0.065023 [opencl_init] opencl related configuration options:
0.065030 [opencl_init] 
0.065031 [opencl_init] opencl: 1
0.065032 [opencl_init] opencl_scheduling_profile: 'default'
0.065034 [opencl_init] opencl_library: ''
0.065035 [opencl_init] opencl_memory_requirement: 768
0.065036 [opencl_init] opencl_memory_headroom: 400
0.065037 [opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
0.065039 [opencl_init] opencl_mandatory_timeout: 200
0.065040 [opencl_init] opencl_size_roundup: 16
0.065041 [opencl_init] opencl_async_pixelpipe: 0
0.065042 [opencl_init] opencl_synch_cache: active module
0.065043 [opencl_init] opencl_number_event_handles: 25
0.065044 [opencl_init] opencl_micro_nap: 1000
0.065045 [opencl_init] opencl_use_pinned_memory: 0
0.065048 [opencl_init] opencl_use_cpu_devices: 0
0.065050 [opencl_init] opencl_avoid_atomics: 0
0.065052 [opencl_init] 
0.065183 [opencl_init] found opencl runtime library 'libOpenCL'
0.065194 [opencl_init] opencl library 'libOpenCL' found on your system and loaded
0.118187 [opencl_init] could not get platforms: -1001
0.118197 [opencl_init] FINALLY: opencl is NOT AVAILABLE on this system.
0.118198 [opencl_init] initial status of opencl enabled flag is OFF.

In the journal I see the following messages when darktable tries to use opencl:

nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2)
nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2)
nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2)

Is this related to Linux 5.9 Brings Safeguard Following NVIDIA's Recent "GPL Condom" Incident - Phoronix
?

dinosaur · October 13, 2020, 12:13pm

Yes, this is because of the “GPL condom” that prevents loading the UVM module…

BOTH Linux kernel developers and NVIDIA are to blame here.

The first because they behave like OpenSource nazies (and are in the process violating laws such as UE law which does NOT allow API copyrighting): imagine what would happen to OpenSource if Micro$oft or Apple would suddenly behave in the exact same way and forbid to use their OS’ APIs unless the software using them would be using the same license… OpenSource software would simply not run any more on Windows PCs and Macs and would be doomed !

The second, because their OpenSource support and production is extremely poor (to say the least) without even a valid reason (why would the competition have better contributions if there was any valid reason to hide their software behind closed sources doors), and because instead of discussing the issue like civilized persons with Linux kernel devels, they just try and bypass the existing restrictions rather than to consider taking the Open Source route (at least for part of their code: no one would blame them for keeping some code closed as a binary blob: almost every other hardware manufacturer, including AMD, are doing it).

As an end user, it really pisses me off big time.

So here is my advice, until those “fine people” pertaining to the Linux kernel and NVIDIA devel teams finally find a common, sane ground (i.e. behave as adults instead of like children in a kindergarten), just do like what I did to circumvent this issue:
1.- turn the license in the NVIDIA kernel sources from “NVIDIA” TO “GPL”
2.- disable (just couple #if 0 … #endif to insert in kernel/module.c) the GPL condom code (compare v5.8 and v5.9 kernel sources to find out where to insert those directives).
Any of the above two solutions will allow you to recover the full functionality of the NVIDIA drivers on your system, and as long as you do not distribute the resulting binaries, no one can blame (or sue you) you for it !

fr314159 · October 13, 2020, 7:26pm

Thanks for the tips on how to circumvent.

But could you please elaborate on method #1. Where do I find the file or files that contain the license information?

pobrn · October 13, 2020, 7:31pm

Grep for “MODULE_LICENSE” in the kernel directory after running the installer with --extract-only.

fr314159 · October 13, 2020, 9:18pm

I can’t thank you enough.

I found the files, made a patch for Gentoo, and built the drivers. It all functions normally.

The kerne developers should include some kind of “opt out” for users that just want the nvidia driver and do not distribute the product.

mbod · October 14, 2020, 5:53am

Based on what @dinosaur said, I created my own kernel patch to make nvidia drivers work again. The patch reverts the commit which introduced the TAINT stuff:

https://github.com/torvalds/linux/commit/262e6ae7081df304fc625cf368d5c2cbba2bb991

This is my patch:

--- linux-5.9/kernel/module.c.old	2020-10-14 06:51:57.598066293 +0200
+++ linux-5.9/kernel/module.c	2020-10-14 07:58:16.504570606 +0200
@@ -1431,6 +1431,7 @@
 	return 0;
 }
 
+#if 0
 static bool inherit_taint(struct module *mod, struct module *owner)
 {
 	if (!owner || !test_bit(TAINT_PROPRIETARY_MODULE, &owner->taints))
@@ -1449,6 +1450,7 @@
 	}
 	return true;
 }
+#endif
 
 /* Resolve a symbol for this module.  I.e. if we find one, record usage. */
 static const struct kernel_symbol *resolve_symbol(struct module *mod,
@@ -1474,6 +1476,7 @@
 	if (!sym)
 		goto unlock;
 
+#if 0
 	if (license == GPL_ONLY)
 		mod->using_gplonly_symbols = true;
 
@@ -1481,6 +1484,7 @@
 		sym = NULL;
 		goto getname;
 	}
+#endif
 
 	if (!check_version(info, name, mod, crc)) {
 		sym = ERR_PTR(-EINVAL);

It works just fine!

shrisha · October 14, 2020, 6:24am

Is this for drivers to patch or for kernel?

mbod · October 14, 2020, 7:21am

Is this question for me?

Did I write "I created my own kernel patch "?

I assume that it is illegal to patch the nvidia drivers to claim they are GPL. As a user you can do that, but not as a distro provider.

jeff.chua.linux · October 17, 2020, 7:22am

Patch works, but with latest linux git pull (commit 071a0578b0ce0b0e543d1e38ee6926b9cc21c198), compile fails …

/v6/src/nvidia-455.28/nvidia/nv-dma.c:631:37: error: implicit declaration of function ‘get_dma_ops’; did you mean ‘get_mm_rss’? [-Werror=implicit-function-declaration]
const struct dma_map_ops *ops = get_dma_ops(dma_dev->dev);
^~~~~~~~~~~
get_mm_rss
/v6/src/nvidia-455.28/nvidia/nv-dma.c:631:37: warning: initialization of ‘const struct dma_map_ops *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
/v6/src/nvidia-455.28/nvidia/nv-dma.c:646:16: error: dereferencing pointer to incomplete type ‘const struct dma_map_ops’
return (ops->map_resource != NULL);
^~
/v6/src/nvidia-455.28/nvidia/nv-dma.c:650:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^

I think that erorr can be fixed with this …

— nvidia-455.28/nvidia/nv-dma.c 2020-10-17 14:36:50.215295676 +0800
+++ nvidia-455.28/nvidia/nv-dma.c 2020-10-17 14:38:42.075296625 +0800
@@ -627,26 +627,7 @@
nv_dma_device_t *dma_dev
)
{
-#if defined(NV_DMA_MAP_RESOURCE_PRESENT)

const struct dma_map_ops *ops = get_dma_ops(dma_dev->dev);
if (ops == NULL)
{

   /* On pre-5.0 kernels, if dma_map_resource() is present, then we

    * assume that ops != NULL.  With direct_dma handling swiotlb on 5.0+

```
    * kernels, ops == NULL.
```
```
    */
```

-#if defined(NV_DMA_IS_DIRECT_PRESENT)
return NV_TRUE;
-#else

```
   return NV_FALSE;
```

-#endif

}
return (ops->map_resource != NULL);
-#else
return NV_FALSE;
-#endif
}

/* DMA-map a peer PCI device’s BAR for peer access. */

But the following “dev” … I don’t know to fix that …

/v6/src/nvidia-455.28/nvidia-drm/nvidia-drm-gem-user-memory.c:63:12: error: too few arguments to function ‘drm_prime_pages_to_sg’
return drm_prime_pages_to_sg(nv_user_memory->pages,
^~~~~~~~~~~~~~~~~~~~~
In file included from /v6/src/nvidia-455.28/nvidia-drm/nvidia-drm-gem-user-memory.c:28:
./include/drm/drm_prime.h:91:18: note: declared here
struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
^~~~~~~~~~~~~~~~~~~~~

Looks like need to pass addition “dev” to the function drm_prime_pages_to_sg() …

Jeff

lamarujian · November 13, 2020, 2:26pm

I can’t wait anymore.But I don’t know how to use this patch,can you pls tell me more detailed steps to patch kernels,thanks a lot.

Topic		Replies	Views
Clear Linux - ERROR: Unable to load the 'nvidia-drm' kernel module - GTX 970 Linux	27	7915	July 16, 2020
Openacc, command exited with non_zero status 1 nvc, nvc++ and nvfortran cuda , ubuntu	19	1337	October 10, 2021
nvidia-modprobe from Developer Zone debian repository fails Linux	8	12640	February 19, 2014
Linux new kernel 6.5.0-14(ubuntu 22.04) can not compile NVIDIA display card driver Linux	45	23885	December 11, 2024
/dev/nvidia-uvm IO error on Ubuntu 22.04, 520 to 535 driver versions Linux cuda , opencl , linux-driver	2	3018	August 27, 2023
RmInitAdapter failed! since kernel > 6.4 Linux kernel	28	3590	November 5, 2024
Nvidia drivers do not install with Kernel 4.6 Linux	26	37489	November 7, 2016
OpenCL on Linux woes Linux	6	2107	March 27, 2017
Building nvidia driver on kernel 3.9.0 Linux	32	34639	July 9, 2013
AMD Ryzen 7 + Geforce GTX 1660 Ti laptop ---> cannot get Nvidia to be used as primary graphics Linux	82	5127	July 27, 2019

Opencl not working with kernel 5.9

Related topics