openSUSE Tumbleweed, kernel 5.0.5-1: nvidia-uvm module 418.56 does not load - Unknown symbol __pcpu_...

Hello!

I installed a GTX1660Ti in my Linux system and tried running the latest NVidia 418.56 driver provided by my distribution. The driver itself runs great, 3D acceleration is good. However, I see a message when trying to load the nvidia-uvm kernel module:

foxhome:~ # uname -a
Linux foxhome 5.0.5-1-default #1 SMP Wed Mar 27 11:22:35 UTC 2019 (0fb0b14) x86_64 x86_64 x86_64 GNU/Linux

foxhome:~ # modprobe nvidia-uvm ; dmesg | tail -n1
2019-04-09T22:48:31.310000+02:00 foxhome kernel: [35249.586545] nvidia_uvm: Unknown symbol __pcpu_unique_interrupt_thread_context (err -2)
modprobe: ERROR: could not insert ‘nvidia_uvm’: Unknown symbol in module, or unknown parameter (see dmesg)
[35249.586545] nvidia_uvm: Unknown symbol __pcpu_unique_interrupt_thread_context (err -2)

According to my logfiles, the nvidia-uvm kernel module used to load fine on Kernel 4.16 with NVidia 390.59 and a GTX970 installed.

As a test I built the driver from source, but this gave the same result. I looked all over the net for the missing symbol, but it seems nobody ever mentioned it anywhere. I also noticed that I can’t use CUDA functions anymore. Is that a result of the nvidia-uvm.ko not loading?

I appreciate your time and any hints on how to fix this issue!

Thank you,
Dario

The uvm module is a needed part of cuda. __pcpu_unique_interrupt_thread_context means it can’t access its per cpu defined variable interrupt_thread_context. Looks like something broke with kernel 5.0 in regard to that, so the driver needs patching. Maybe the installer or dkms log contains more info.

Thank you for the reply!

I can see this in the driver source:

foxhome:/installs/nvidia/NVIDIA-Linux-x86_64-418.56/kernel/nvidia-uvm # fgrep interrupt_thread_context *
grep: hwref: Is a directory
uvm8_thread_context.c:static DEFINE_PER_CPU(uvm_thread_context_t, interrupt_thread_context);
uvm8_thread_context.c: thread_context = &get_cpu_var(interrupt_thread_context);
uvm8_thread_context.c: put_cpu_var(interrupt_thread_context);
Binary file uvm8_thread_context.o matches

I assume this is where the symbol is created. Any hint on what to change that might make it work? I know my way around C and C++ but the kernel always confused me :-)

Thanks again!

I suspect it’s rather a symptom, the installer log should contain the compiler messages.

The compiler runs through the code without error. I also compiled the respective parts manually to see what’s happening, and it just builds the module. Only when you try and modprobe/insmod it the error comes up.

Thanks for the response!
Dario

The only recent change in the kernel was this:
[url]https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/include/linux/percpu-defs.h?h=v5.0.7&id=69a60bc75fe73511af89328ded1b33bc4a625a5c[/url]
Doesn’t really help to make head or tails of it, the per cpu variable macro stuff is not really intuitive.

Yeah, that doesn’t look like it would change anything with symbol exporting. I am trying to find examples of other modules using per CPU variables, but so far it seems that it’s normally used the other way around, defined by the kernel and exported to be used by modules … all very strange :-)

Just updated to kernel 5.0.6 without success. The only real difference I can see between the compiled 4.16 modules and the 5.0.6 modules is that 4.16 shows “preempt” in modinfo’s vermagic tag, while 5.0.6 does not.

Hi!
I have the same problem.

#prime-select nvidia:

modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/5.0.5-1-default

ERROR: Unable to query GPU information

PCI BusID of NVIDIA card could not be detected!

zchronos, looks like you’re using Ubuntu. You need the 418.56 driver from ppa for kernel 5.0.
[url]https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa[/url]

generix, I’m using openSUSE Tumbleweed (clean install):
Plasma: 5.15.3
Kernel: 5.0.5-1-default
Video: Intel / Nvidia 1050 Ti (4GB)

zchronos, on opsnSUSE it’s best to install the driver via YaST or zypper:

zypper in nvidia-gfxG05-kmp

It would be interesting to see if you also get the problem that nvidia-uvm doesn’t load …

Hi!
Of course, I installed the nvidia driver from the repositories.

# zypper lr
Prioridades del repositorio en vigor:                                                                                                                                                                                    (Consulte 'zypper lr -P' para obtener más información)
      90 (prioridad aumentada)   :  1 repositorio 
      99 (prioridad por defecto) :  5 repositorios

# | Alias               | Nombre                      | Habilitado | Comprobación GPG | Actualizar
--+---------------------+-----------------------------+------------+------------------+-----------
1 | NVIDIA              | NVIDIA                      | Sí         | (r ) Sí          | Sí        
2 | google-chrome       | google-chrome               | Sí         | ( p) Sí          | Sí        
3 | openSUSE-20190402-0 | openSUSE-20190402-0         | No         | ----             | ----      
4 | packman             | packman                     | Sí         | (r ) Sí          | Sí        
5 | repo-non-oss        | openSUSE-Tumbleweed-Non-Oss | Sí         | (r ) Sí          | Sí        
6 | repo-oss            | openSUSE-Tumbleweed-Oss     | Sí         | (r ) Sí          | Sí        
7 | repo-source         | openSUSE-Tumbleweed-Source  | No         | ----             | ----      
8 | repo-update         | openSUSE-Tumbleweed-Update  | Sí         | (r ) Sí          | Sí

Also:

# modprobe nvidia-uvm
modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/5.0.5-1-default

It should be here:

# ls /lib/modules/5.0.5-1-default/updates/
nvidia-drm.ko  nvidia-modeset.ko  nvidia-uvm.ko  nvidia.ko

And referenced here (before being recompiled to end up in 5.0.5):

# rpm -ql nvidia-gfxG05-kmp-default | fgrep uvm | fgrep .ko
/lib/modules/5.0.3-1-default/updates/nvidia-uvm.ko

So if these are not there I assume the compile run failed and you didn’t get the drivers to install correctly. Have a look at your y2log after installing the driver.

Well, first, I disabled the nvidia repository, updated the system (zypper dup), clean the kernels, enable the nvidia repository, install nvidia-gfxG05-kmp and… It Works!

# inxi -Gx
Graphics:  Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 
           Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] vendor: Dell driver: nvidia v: 418.56 bus ID: 01:00.0 
           Display: x11 server: X.org 1.20.4 driver: modesetting,nvidia resolution: <xdpyinfo missing> 
           OpenGL: renderer: GeForce GTX 1050 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 418.56 direct render: Yes

Thanks!

Just for completeness, could you do a:

lsmod | fgrep nvidia

and

modprobe -v nvidia-uvm

Thanks!

# lsmod | fgrep nvidia
nvidia_drm             53248  7
nvidia_modeset       1089536  23 nvidia_drm
nvidia_uvm            925696  0
nvidia              17641472  1228 nvidia_uvm,nvidia_modeset
drm_kms_helper        204800  2 nvidia_drm,i915
drm                   499712  11 drm_kms_helper,nvidia_drm,i915
ipmi_msghandler        65536  2 ipmi_devintf,nvidia

modprobe -v nvidia-uvm

(Nothing)

Thank you for your feedback! Seems like for you the nvidia-uvm module loaded just fine. I wonder what my kernel/driver is missing.

Maybe something is triggered by your specific hardware.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
[url]https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/[/url]

I don’t really think so as the problem arises at the linker level, but I’ll attach it to this post. Maybe I am not seeing the whole picture here.

[Edit:] I also tried recompiling the driver with gcc-7 just now, same issue.

nvidia-bug-report.log.gz (1.01 MB)