I installed a GTX1660Ti in my Linux system and tried running the latest NVidia 418.56 driver provided by my distribution. The driver itself runs great, 3D acceleration is good. However, I see a message when trying to load the nvidia-uvm kernel module:
foxhome:~ # uname -a
Linux foxhome 5.0.5-1-default #1 SMP Wed Mar 27 11:22:35 UTC 2019 (0fb0b14) x86_64 x86_64 x86_64 GNU/Linux
foxhome:~ # modprobe nvidia-uvm ; dmesg | tail -n1
2019-04-09T22:48:31.310000+02:00 foxhome kernel: [35249.586545] nvidia_uvm: Unknown symbol __pcpu_unique_interrupt_thread_context (err -2)
modprobe: ERROR: could not insert ‘nvidia_uvm’: Unknown symbol in module, or unknown parameter (see dmesg)
[35249.586545] nvidia_uvm: Unknown symbol __pcpu_unique_interrupt_thread_context (err -2)
According to my logfiles, the nvidia-uvm kernel module used to load fine on Kernel 4.16 with NVidia 390.59 and a GTX970 installed.
As a test I built the driver from source, but this gave the same result. I looked all over the net for the missing symbol, but it seems nobody ever mentioned it anywhere. I also noticed that I can’t use CUDA functions anymore. Is that a result of the nvidia-uvm.ko not loading?
I appreciate your time and any hints on how to fix this issue!
The uvm module is a needed part of cuda. __pcpu_unique_interrupt_thread_context means it can’t access its per cpu defined variable interrupt_thread_context. Looks like something broke with kernel 5.0 in regard to that, so the driver needs patching. Maybe the installer or dkms log contains more info.
Thank you for the reply!
I can see this in the driver source:
foxhome:/installs/nvidia/NVIDIA-Linux-x86_64-418.56/kernel/nvidia-uvm # fgrep interrupt_thread_context *
grep: hwref: Is a directory
uvm8_thread_context.c:static DEFINE_PER_CPU(uvm_thread_context_t, interrupt_thread_context);
uvm8_thread_context.c: thread_context = &get_cpu_var(interrupt_thread_context);
Binary file uvm8_thread_context.o matches
I assume this is where the symbol is created. Any hint on what to change that might make it work? I know my way around C and C++ but the kernel always confused me :-)
I suspect it’s rather a symptom, the installer log should contain the compiler messages.
The compiler runs through the code without error. I also compiled the respective parts manually to see what’s happening, and it just builds the module. Only when you try and modprobe/insmod it the error comes up.
Thanks for the response!
The only recent change in the kernel was this:
Doesn’t really help to make head or tails of it, the per cpu variable macro stuff is not really intuitive.
Yeah, that doesn’t look like it would change anything with symbol exporting. I am trying to find examples of other modules using per CPU variables, but so far it seems that it’s normally used the other way around, defined by the kernel and exported to be used by modules … all very strange :-)
Just updated to kernel 5.0.6 without success. The only real difference I can see between the compiled 4.16 modules and the 5.0.6 modules is that 4.16 shows “preempt” in modinfo’s vermagic tag, while 5.0.6 does not.
I have the same problem.
modprobe: FATAL: Module nvidia_drm not found in directory /lib/modules/5.0.5-1-default
ERROR: Unable to query GPU information
PCI BusID of NVIDIA card could not be detected!
zchronos, looks like you’re using Ubuntu. You need the 418.56 driver from ppa for kernel 5.0.
generix, I’m using openSUSE Tumbleweed (clean install):
Video: Intel / Nvidia 1050 Ti (4GB)
zchronos, on opsnSUSE it’s best to install the driver via YaST or zypper:
zypper in nvidia-gfxG05-kmp
It would be interesting to see if you also get the problem that nvidia-uvm doesn’t load …
Of course, I installed the nvidia driver from the repositories.
# zypper lr
Prioridades del repositorio en vigor: (Consulte 'zypper lr -P' para obtener más información)
90 (prioridad aumentada) : 1 repositorio
99 (prioridad por defecto) : 5 repositorios
# | Alias | Nombre | Habilitado | Comprobación GPG | Actualizar
1 | NVIDIA | NVIDIA | Sí | (r ) Sí | Sí
2 | google-chrome | google-chrome | Sí | ( p) Sí | Sí
3 | openSUSE-20190402-0 | openSUSE-20190402-0 | No | ---- | ----
4 | packman | packman | Sí | (r ) Sí | Sí
5 | repo-non-oss | openSUSE-Tumbleweed-Non-Oss | Sí | (r ) Sí | Sí
6 | repo-oss | openSUSE-Tumbleweed-Oss | Sí | (r ) Sí | Sí
7 | repo-source | openSUSE-Tumbleweed-Source | No | ---- | ----
8 | repo-update | openSUSE-Tumbleweed-Update | Sí | (r ) Sí | Sí
# modprobe nvidia-uvm
modprobe: FATAL: Module nvidia-uvm not found in directory /lib/modules/5.0.5-1-default
It should be here:
# ls /lib/modules/5.0.5-1-default/updates/
nvidia-drm.ko nvidia-modeset.ko nvidia-uvm.ko nvidia.ko
And referenced here (before being recompiled to end up in 5.0.5):
# rpm -ql nvidia-gfxG05-kmp-default | fgrep uvm | fgrep .ko
So if these are not there I assume the compile run failed and you didn’t get the drivers to install correctly. Have a look at your y2log after installing the driver.
Well, first, I disabled the nvidia repository, updated the system (zypper dup), clean the kernels, enable the nvidia repository, install nvidia-gfxG05-kmp and… It Works!
# inxi -Gx
Graphics: Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0
Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] vendor: Dell driver: nvidia v: 418.56 bus ID: 01:00.0
Display: x11 server: X.org 1.20.4 driver: modesetting,nvidia resolution: <xdpyinfo missing>
OpenGL: renderer: GeForce GTX 1050 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 418.56 direct render: Yes
Just for completeness, could you do a:
lsmod | fgrep nvidia
modprobe -v nvidia-uvm
# lsmod | fgrep nvidia
nvidia_drm 53248 7
nvidia_modeset 1089536 23 nvidia_drm
nvidia_uvm 925696 0
nvidia 17641472 1228 nvidia_uvm,nvidia_modeset
drm_kms_helper 204800 2 nvidia_drm,i915
drm 499712 11 drm_kms_helper,nvidia_drm,i915
ipmi_msghandler 65536 2 ipmi_devintf,nvidia
modprobe -v nvidia-uvm
Thank you for your feedback! Seems like for you the nvidia-uvm module loaded just fine. I wonder what my kernel/driver is missing.
Maybe something is triggered by your specific hardware.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
I don’t really think so as the problem arises at the linker level, but I’ll attach it to this post. Maybe I am not seeing the whole picture here.
[Edit:] I also tried recompiling the driver with gcc-7 just now, same issue.
nvidia-bug-report.log.gz (1.01 MB)