CentOS Stream 8: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

i am facing issues with installing a nvidia driver on my server.
My server has a GTX1080 running on CentOS Stream 8.

# cat /etc/centos-release
CentOS Stream release 8

# lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

I installed the NVIDIA Driver according to the instructions on the NVIDIA Driver Installation Quickstart Guide starting on Point 3.3 (Package Installers / CentOS)
followed by the post installation actions and completed the mandatory requirements.

When running nvidia-smi i get following error:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

dnf list installed | grep nvidia results in:

# dnf list installed | grep nvidia
dnf-plugin-nvidia.noarch                 2.0-1.el8                                @cuda-rhel8-x86_64
kmod-nvidia-latest-dkms.x86_64           3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver.x86_64                     3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-NVML.x86_64                3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-NvFBCOpenGL.x86_64         3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-cuda.x86_64                3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-cuda-libs.x86_64           3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-devel.x86_64               3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-driver-libs.x86_64                3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-kmod-common.noarch                3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-libXNVCtrl.x86_64                 3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-libXNVCtrl-devel.x86_64           3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-modprobe.x86_64                   3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-persistenced.x86_64               3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-settings.x86_64                   3:515.48.07-1.el8                        @cuda-rhel8-x86_64
nvidia-xconfig.x86_64                    3:515.48.07-1.el8                        @cuda-rhel8-x86_64

Nvidia Bug report is attached:
nvidia-bug-report.log.gz (45.9 KB)

Could someone please help me and tell me what i missed during installation?

For anyone having the same issue. I have solved the problem. This is what i did:

  1. Deinstall/ remove driver installation
    sudo dnf module remove --all nvidia-driver
  2. Also check for cuda installation and remove
sudo dnf remove "cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
 "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*"
  1. Download the desired .run file using curl
  2. change to root
  3. Disable nouveau
vi /etc/modprobe.d/blacklist-nouveau.conf

blacklist nouveau
options nouveau modeset=0

I did disable nouveau in the previous installation process using the recommended workflow but also after reboot I noticed that it was still loaded.
Now the most important part:
dracut --force
6. reboot your system
7. check if all required packages are installed and execute the .run file

Props to serverworld and this article: CentOS 8 : NVIDIA Tools : Server World

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.