Hi, we tried to install the 550 drivers in our HPC and it broke the nvidia-smi command. We’re running Ubuntu 23.04, and were trying to update the nvidia drivers, cuda, and cudnn to support our modeling pipeline.
When I input the nvidia-smi
command, I get:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I’ve purged the nvidia files and reinstalled them from scratch multiple times, but this issue persists. I finally tried installing a version of 535 instead, but it still doesn’t work. I’m attaching the nvidia report, and it would be great if anyone could assist me with this. Much thanks in advance!
nvidia-bug-report.log.gz (208.3 KB)
GPU: RTX A4000
dkms status
nvidia/535.146.02, 6.2.0-39-generic, x86_64: installed
dpkg -l |grep nvidia
ii libnvidia-cfg1-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-535 535.146.02-0ubuntu0.23.04.1 all Shared files used by the NVIDIA libraries
rc libnvidia-compute-525:amd64 525.147.05-0ubuntu0.23.04.1 amd64 NVIDIA libcompute package
ii libnvidia-compute-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA libcompute package
ii libnvidia-decode-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-egl-wayland1:amd64 1:1.1.10-1 amd64 Wayland EGL External Platform library -- shared library
ii libnvidia-encode-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVENC Video Encoding runtime library
ii libnvidia-extra-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-535:amd64 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
rc linux-modules-nvidia-525-6.2.0-39-generic 6.2.0-39.40+2 amd64 Linux kernel nvidia modules for version 6.2.0-39
ii linux-objects-nvidia-525-6.2.0-39-generic 6.2.0-39.40+2 amd64 Linux kernel nvidia modules for version 6.2.0-39 (objects)
ii linux-signatures-nvidia-6.2.0-39-generic 6.2.0-39.40+2 amd64 Linux kernel signatures for nvidia modules for version 6.2.0-39-generic
rc nvidia-compute-utils-525 525.147.05-0ubuntu0.23.04.1 amd64 NVIDIA compute utilities
ii nvidia-compute-utils-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA compute utilities
ii nvidia-dkms-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA DKMS package
ii nvidia-driver-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA driver metapackage
ii nvidia-firmware-535-535.146.02 535.146.02-0ubuntu0.23.04.1 amd64 Firmware files used by the kernel module
rc nvidia-kernel-common-525 525.147.05-0ubuntu0.23.04.1 amd64 Shared files used with the kernel module
ii nvidia-kernel-common-535 535.146.02-0ubuntu0.23.04.1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA kernel source package
ii nvidia-prime 0.8.17.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 510.47.03-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA driver support binaries
ii screen-resolution-extra 0.18.3 all Extension for the nvidia-settings control panel
ii xserver-xorg-video-nvidia-535 535.146.02-0ubuntu0.23.04.1 amd64 NVIDIA binary Xorg driver
gcc- v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04)
cc -v
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04
sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service