Ubuntu 20.04 - CUDA 11.1.1: Missing nvidia-uvm

Hey,

I’m trying to access my RTX 2080ti GPU using a built version of Tensorflow 2.4.0-rc1, however, I’m getting the following error:

2020-11-12 16:12:21.876416: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-12 16:12:21.957919: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2020-11-12 16:12:21.958051: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: hoopoe-u-u
2020-11-12 16:12:21.958082: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: hoopoe-u-u
2020-11-12 16:12:21.958297: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 455.32.0
2020-11-12 16:12:21.958396: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 455.32.0
2020-11-12 16:12:21.958423: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 455.32.0

Afer checking ls /dev/nvidia*, the present files are:

/dev/nvidia0  /dev/nvidiactl  /dev/nvidia-modeset

/dev/nvidia-caps:
nvidia-cap1  nvidia-cap2

After running the Device Node Verification script for enabling/fixing nvidia_uvm, I get:

mknod: /dev/nvidia0: File exists
mknod: /dev/nvidiactl: File exists
modprobe: ERROR: could not insert 'nvidia_uvm': Unknown symbol in module, or unknown parameter (see dmesg)

Then checking the dmesg, the last outputs:

[ 4025.104864] audit: type=1400 audit(1605194221.534:3846): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_nss" name="/proc/16980/cmdline" pid=811 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4025.105569] audit: type=1400 audit(1605194221.534:3847): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_nss" name="/proc/16981/cmdline" pid=811 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4037.833657] audit: type=1400 audit(1605194234.263:3848): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_nss" name="/proc/16987/cmdline" pid=811 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=1168429606
[ 4043.073973] audit: type=1400 audit(1605194239.503:3849): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_nss" name="/proc/17003/cmdline" pid=811 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4043.085345] audit: type=1400 audit(1605194239.515:3850): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_pam" name="/proc/17003/cmdline" pid=812 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4043.136344] audit: type=1400 audit(1605194239.567:3851): apparmor="ALLOWED" operation="mknod" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/var/lib/sss/pubconf/.krb5info_dummy_HvXPyr" pid=810 comm="sssd_be" requested_mask="c" denied_mask="c" fsuid=0 ouid=0
[ 4043.136346] audit: type=1400 audit(1605194239.567:3852): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/var/lib/sss/pubconf/.krb5info_dummy_HvXPyr" pid=810 comm="sssd_be" requested_mask="wrc" denied_mask="wrc" fsuid=0 ouid=0
[ 4043.136347] audit: type=1400 audit(1605194239.567:3853): apparmor="ALLOWED" operation="chmod" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/var/lib/sss/pubconf/.krb5info_dummy_HvXPyr" pid=810 comm="sssd_be" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
[ 4043.136348] audit: type=1400 audit(1605194239.567:3854): apparmor="ALLOWED" operation="rename_src" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/var/lib/sss/pubconf/.krb5info_dummy_HvXPyr" pid=810 comm="sssd_be" requested_mask="wrd" denied_mask="wrd" fsuid=0 ouid=0
[ 4043.136349] audit: type=1400 audit(1605194239.567:3855): apparmor="ALLOWED" operation="rename_dest" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/var/lib/sss/pubconf/kdcinfo.AD.IGD.FRAUNHOFER.DE" pid=810 comm="sssd_be" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=0
[ 4043.137317] audit: type=1400 audit(1605194239.567:3856): apparmor="ALLOWED" operation="exec" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/libexec/sssd/ldap_child" pid=17008 comm="sssd_be" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be//null-/usr/libexec/sssd/ldap_child"
[ 4043.137892] audit: type=1400 audit(1605194239.567:3857): apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be//null-/usr/libexec/sssd/ldap_child" name="/usr/libexec/sssd/ldap_child" pid=17008 comm="ldap_child" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4043.137894] audit: type=1400 audit(1605194239.567:3858): apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be//null-/usr/libexec/sssd/ldap_child" name="/usr/lib/x86_64-linux-gnu/ld-2.31.so" pid=17008 comm="ldap_child" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4043.442952] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 4043.443113] nvidia_uvm: Unknown symbol radix_tree_preloads (err -2)
[ 4043.443142] nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2)
[ 4043.443175] nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2)
[ 4043.443253] nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2)
[ 4087.312635] kauditd_printk_skb: 86 callbacks suppressed
[ 4087.312636] audit: type=1400 audit(1605194283.743:3945): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_nss" name="/proc/17046/cmdline" pid=811 comm="sssd_nss" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4087.314778] audit: type=1400 audit(1605194283.743:3946): apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_pam" name="/proc/17046/cmdline" pid=812 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 4087.344651] audit: type=1400 audit(1605194283.775:3947): apparmor="ALLOWED" operation="capable" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" pid=810 comm="sssd_be" capability=2  capname="dac_read_search"

I’m trying to enable Tensorflow GPU access, and fixing this nvidia_uvm might be the solution for this problem. Can you please assist me in this? The built tensorflow python wheel was tested in a nvidia-docker container with Cuda 11.1.1 and Ubuntu 20.04 and worked fine. However, there is a problem accessing the gpu on my host machine outside the container …

SOLVED. The problem was updating my linux-kernel to the latest one (5.9.8). Rolling back to v5.4 solved the problem since nvidia_uvm which is needed for CUDA is not supported for the latest unsigned linux-kernel.