For the past two days, I have been trying to install cuda-drivers 550.54.15 on a V100 node. using
apt-get install cuda-drivers-550=550.54.15 results into a mix of dependencies version, some calling 550.90.x.
I tried
`
> apt-get install -y cuda-drivers-550=550.54.15-1 nvidia-driver-550=550.54.15-0ubuntu1 libnvidia-fbc1-550=550.54.15-0ubuntu1 xserver-xorg-video-nvidia-550=550.54.15-0ubuntu1 libnvidia-cfg1-550=550.54.15-0ubuntu1 nvidia-utils-550=550.54.15-0ubuntu1 libnvidia-encode-550=550.54.15-0ubuntu1 libnvidia-decode-550=550.54.15-0ubuntu1 nvidia-compute-utils-550=550.54.15-0ubuntu1 libnvidia-extra-550=550.54.15-0ubuntu1 nvidia-dkms-550=550.54.15-0ubuntu1 nvidia-kernel-common-550=550.54.15-0ubuntu1 nvidia-kernel-source-550=550.54.15-0ubuntu1 libnvidia-gl-550=550.54.15-0ubuntu1 libnvidia-compute-550=550.54.15-0ubuntu1 libnvidia-common-550=550.54.15-0ubuntu1 xserver-xorg-core nvidia-firmware-550-550.54.15=550.54.15-0ubuntu1
`
but even if it appears that the installation succeeded after reboot
nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
lsmod | grep nvidia empty
dpkg -l | grep nvidia
ii libnvidia-cfg1-550:amd64 550.54.15-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-550 550.54.15-0ubuntu1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-550:amd64 550.54.15-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-decode-550:amd64 550.54.15-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-550:amd64 550.54.15-0ubuntu1 amd64 NVENC Video Encoding runtime library
ii libnvidia-extra-550:amd64 550.54.15-0ubuntu1 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-550:amd64 550.54.15-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-550:amd64 550.54.15-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii nvidia-compute-utils-550 550.54.15-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-dkms-550 550.54.15-0ubuntu1 amd64 NVIDIA DKMS package
ii nvidia-driver-550 550.54.15-0ubuntu1 amd64 NVIDIA driver metapackage
ii nvidia-firmware-550-550.54.15 550.54.15-0ubuntu1 amd64 Firmware files used by the kernel module
ii nvidia-kernel-common-550 550.54.15-0ubuntu1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-550 550.54.15-0ubuntu1 amd64 NVIDIA kernel source package
ii nvidia-utils-550 550.54.15-0ubuntu1 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-550 550.54.15-0ubuntu1 amd64 NVIDIA binary Xorg driver
dkms status
lustre-client-modules/2.15.3, 5.15.0-112-generic, x86_64: installed
lustre-client-modules/2.15.3, 5.15.0-113-generic, x86_64: installed
nvidia/550.54.15, 5.15.0-113-generic, x86_64: installed
grep -R nvidia /etc/modprobe.d/
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-graphics-drivers-kms.conf:# This file was generated by nvidia-driver-550
/etc/modprobe.d/nvidia-graphics-drivers-kms.conf:options nvidia-drm modeset=1
sudo modprobe -vv nvidia
modprobe: INFO: ../libkmod/libkmod.c:367 kmod_set_log_fn() custom logging function 0x557105435830 registered
insmod /lib/modules/5.15.0-113-generic/updates/dkms/nvidia.ko
modprobe: INFO: ../libkmod/libkmod-module.c:892 kmod_module_insert_module() Failed to insert module '/lib/modules/5.15.0-113-generic/updates/dkms/nvidia.ko': Exec format error
modprobe: ERROR: could not insert 'nvidia': Exec format error
modprobe: INFO: ../libkmod/libkmod.c:334 kmod_unref() context 0x5571072ba460 released
dmesg | grep nvidia
[ 8.779041] nvidia: loading out-of-tree module taints kernel.
[ 8.779549] nvidia: module license 'NVIDIA' taints kernel.
[ 8.811504] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 14.485991] audit: type=1400 audit(1721046506.216:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=5625 comm="apparmor_parser"
[ 14.485996] audit: type=1400 audit(1721046506.216:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=5625 comm="apparmor_parser"
root@farm22-gpu0102:/nfs/users/nfs_f/fg12#
How can I solve this issue?