NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Hi, we tried to install the 550 drivers in our HPC and it broke the nvidia-smi command. We’re running Ubuntu 23.04, and were trying to update the nvidia drivers, cuda, and cudnn to support our modeling pipeline.

When I input the nvidia-smi command, I get:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I’ve purged the nvidia files and reinstalled them from scratch multiple times, but this issue persists. I finally tried installing a version of 535 instead, but it still doesn’t work. I’m attaching the nvidia report, and it would be great if anyone could assist me with this. Much thanks in advance!

nvidia-bug-report.log.gz (208.3 KB)

GPU: RTX A4000

dkms status

nvidia/535.146.02, 6.2.0-39-generic, x86_64: installed

dpkg -l |grep nvidia

ii  libnvidia-cfg1-535:amd64                   535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-535                       535.146.02-0ubuntu0.23.04.1             all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-525:amd64                525.147.05-0ubuntu0.23.04.1             amd64        NVIDIA libcompute package
ii  libnvidia-compute-535:amd64                535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA libcompute package
ii  libnvidia-decode-535:amd64                 535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64               1:1.1.10-1                              amd64        Wayland EGL External Platform library -- shared library
ii  libnvidia-encode-535:amd64                 535.146.02-0ubuntu0.23.04.1             amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-535:amd64                  535.146.02-0ubuntu0.23.04.1             amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-535:amd64                   535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-535:amd64                     535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
rc  linux-modules-nvidia-525-6.2.0-39-generic  6.2.0-39.40+2                           amd64        Linux kernel nvidia modules for version 6.2.0-39
ii  linux-objects-nvidia-525-6.2.0-39-generic  6.2.0-39.40+2                           amd64        Linux kernel nvidia modules for version 6.2.0-39 (objects)
ii  linux-signatures-nvidia-6.2.0-39-generic   6.2.0-39.40+2                           amd64        Linux kernel signatures for nvidia modules for version 6.2.0-39-generic
rc  nvidia-compute-utils-525                   525.147.05-0ubuntu0.23.04.1             amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-535                   535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA compute utilities
ii  nvidia-dkms-535                            535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA DKMS package
ii  nvidia-driver-535                          535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA driver metapackage
ii  nvidia-firmware-535-535.146.02             535.146.02-0ubuntu0.23.04.1             amd64        Firmware files used by the kernel module
rc  nvidia-kernel-common-525                   525.147.05-0ubuntu0.23.04.1             amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-535                   535.146.02-0ubuntu0.23.04.1             amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-535                   535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA kernel source package
ii  nvidia-prime                               0.8.17.1                                all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            510.47.03-0ubuntu1                      amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-535                           535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                    0.18.3                                  all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-535              535.146.02-0ubuntu0.23.04.1             amd64        NVIDIA binary Xorg driver

gcc- v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04)

cc -v

Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 12.3.0-1ubuntu1~23.04' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-DAPbBt/gcc-12-12.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~23.04

sudo modprobe nvidia

modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

Did you resolve this issue? I have the same GPU and Nvidia-smi error. Nvidia-smi cannot reach the GPU with driver version 550 or 560.

Strangely, I can get CUDA 12.6 installed correctly.

I would recommend two things that ended up working out for me:

  1. Make sure that secure boot is enabled (this happened to fix the issue for me somehow)
  2. Roll back your drivers if step 1 doesn’t work, I don’t think the changes are that drastic past 520.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.