Centos8: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

Hi,
i have DELL G5 laptop with Gefore rtx 2060 gpu.
#lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)

I have Centos 8 and installed the nvidia driver using the instructions (https://docs.nvidia.com/datacenter/tesla/pdf/NVIDIA_Driver_Installation_Quickstart.pdf). Howver, when i run nvidia-smi, i get the following error- “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running”

dnf list installed | grep nvidia gives:

dnf-plugin-nvidia.noarch 2.0-1.el8 @cuda-rhel8-11-2-local
kmod-nvidia-latest-dkms.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-NVML.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-NvFBCOpenGL.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-cuda.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-cuda-libs.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-devel.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-driver-libs.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-kmod-common.noarch 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-libXNVCtrl.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-libXNVCtrl-devel.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-modprobe.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-persistenced.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-settings.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local
nvidia-xconfig.x86_64 3:460.32.03-1.el8 @cuda-rhel8-11-2-local

NVIDIA bug report is attachednvidia-bug-report.log (287.3 KB) .

Can someone please help what is missing?

Thank you

You have secure boot enabled so the driver can’t load. Please disable it in bios.
Furthermore, you should rather use a repo driver like rpmfusion so you don’t need to reinstall the driver on kernel updates.
NB: centos has pushed out a broken Xserver update: https://forums.developer.nvidia.com/t/nvidia-linux-x86-64-418-113-wouldnt-build/174775/26?u=generix

Thank you for reply. When i disabled secure boot, I could not see UI. It was black screen. Not even able to login. after enabling secure boot, UI is back. Please suggest.

If you created an xorg.conf, pease delete it. Did you follow the steps in the linked thread?

No I didn’t create xorg.conf. I simply untick the enable secure boot in bios. After restart, I could not see login window.

Thanks

Please create a new nvidia-bug-report.log.

You mean after enabling secure boot because I am not able to login with secure boot disabled

Yes, but after previously having performed a boot with secure boot disabled.

Please find attached bug report.

One more thing which i forgot to tell you earlier is that this is a dual boot system Centos8 and windows 10. Thanks

nvidia-bug-report.log.gz (87.8 KB)

You have created an /etc/X11/xorg.conf. Please delete it.

Thank you so much generix. Its working now. IS there anything else I need to do?

[root@localhost veeru]# nvidia-smi
Fri Apr 16 21:34:16 2021

Please create a new nvidia-bug-report.log so I can have a look.

please find attached report. thanks

nvidia-bug-report.log.gz (1.07 MB)

Please set kernel parameter
nvidia-drm.modeset=1
to enable prime sync for tear-free display, then you’re fine to go.

How to set nvidia-drm.modeset=1 ?
i types in command and it says not found.

One can easily use a search engine to find something like that:

Centos 8 doesn’t use the config files anymore but grubby:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kernel-command-line-parameters_managing-monitoring-and-updating-the-kernel

grubby --update-kernel=ALL --args="nvidia-drm.modeset=1"

Thank you. Drivers are working fine now. Only issue now I have is that I am not able to use windows 10 as I have disabled secure boot. Is there any way that I can use windows also? Thanks

Windows should boot independent of secure boot status. Please check in bios if CSM boot got enabled during secure boot disabling and disable it.

Hello,

I have the same problem after updating my kernels through yum update. I tried deleting the xorg.conf and it did not solve the problem.
I made a copy of that file before removing it so I could reverse it, but now I can’t copy it back. Would you please tell me what to do?

I’m running centos 7