Ubuntu 22.04 installation driver error Nvidia[A10]

Hi,

I’m encountering a problem while trying to install NVIDIA graphics card drivers. After the installation and launching the nvidia-smi command, I receive the following error message:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Nvidia:

lspci | grep -i nvidia
0002:00:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)

Here is the relevant information from the system NVIDIA log:

 NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: NVIDIA 535.161.07 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 1394.865378] nvidia: probe of 0002:00:00.0 failed with error -1
[ 1394.865399] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1394.865401] NVRM: None of the NVIDIA devices were initialized.
[ 1394.865622] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[ 1395.217664] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 1395.219433] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: NVIDIA 535.161.07 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 1395.220340] nvidia: probe of 0002:00:00.0 failed with error -1
[ 1395.220383] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1395.220385] NVRM: None of the NVIDIA devices were initialized.
[ 1395.220670] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[ 1395.577519] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 1395.579245] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: NVIDIA 535.161.07 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 1395.580190] nvidia: probe of 0002:00:00.0 failed with error -1
[ 1395.580209] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1395.580211] NVRM: None of the NVIDIA devices were initialized.
[ 1395.580457] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[ 1395.933188] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 1395.934940] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: NVIDIA 535.161.07 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 1395.935904] nvidia: probe of 0002:00:00.0 failed with error -1
[ 1395.935929] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1395.935931] NVRM: None of the NVIDIA devices were initialized.
[ 1395.936165] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[ 1396.289810] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 1396.291904] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: NVIDIA 535.161.07 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 1396.292825] nvidia: probe of 0002:00:00.0 failed with error -1

dkms status

nvidia/535.161.07, 6.5.0-1015-azure, x86_64: installed

Kernel
6.5.0-1015-azure #15~22.04.1-Ubuntu

Any hints what the problem could be?

Looks like you’re on an Azure vm so you would need to use the grid driver, not the normal graphics driver. Should be downloadable from microsoft.

Thanks for the response

I am installing a grid-type driver, but I still have issues with a successful installation. During the installation, I receive an error message:

Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module ‘nvidia-modeset.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NV IDIA Linux graphics driver release. Please see the log entries ‘Kernel module load error’ and ‘Kernel messages’ at the end of the file ‘/var/log/nvidia-installer.log’ for more information.

below log: /var/log/nvidia-installer.log

WARNING: Unable to determine the path to install the libglvnd EGL vendor library config files. Check that you have pkg-config and the libglvnd development libraries installed, or specify a path with --glvnd-egl-config-path.
Will install libEGL vendor library config file to /usr/share/glvnd/egl_vendor.d
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (535.154.05):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/sbin/depmod -a '...
   executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
ERROR: Unable to load the 'nvidia-drm' kernel module.
-> Kernel messages:
[ 2039.992170] nvidia-nvlink: Unregistered Nvlink Core, major device number 237
[ 2586.344866] VFIO - User Level meta-driver version: 0.3
[ 2586.472069] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 2586.472078] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.154.05  Thu Dec 28 15:37:48 UTC 2023
[ 2586.491479] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 2586.515387] nvidia-uvm: Loaded the UVM driver, major device number 235.
[ 2586.519610] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.154.05  Thu Dec 28 15:51:29 UTC 2023
[ 2586.522019] [drm] [nvidia-drm] [GPU ID 0x00020000] Loading driver
[ 2586.522022] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0002:00:00.0 on minor 0
[ 2586.525427] [drm] [nvidia-drm] [GPU ID 0x00020000] Unloading driver
[ 2586.544181] nvidia-modeset: Unloading
[ 2586.576648] nvidia-uvm: Unloaded the UVM driver.
[ 2586.601971] nvidia-nvlink: Unregistered Nvlink Core, major device number 237
[ 2616.517463] nvidia-nvlink: Nvlink Core is being initialized, major device number 237

[ 2616.524107] NVRM: The NVIDIA GPU 0002:00:00.0 (PCI ID: 10de:2236)
               NVRM: installed in this system is not supported by the
               NVRM: NVIDIA 550.54.14 driver release.
               NVRM: Please see 'Appendix A - Supported NVIDIA GPU Products'
               NVRM: in this release's README, available on the operating system
               NVRM: specific graphics driver download page at www.nvidia.com.
[ 2616.525352] nvidia: probe of 0002:00:00.0 failed with error -1
[ 2616.525376] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2616.525378] NVRM: None of the NVIDIA devices were initialized.
[ 2616.525635] nvidia-nvlink: Unregistered Nvlink Core, major device number 237
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Looks like you already had a compatible 535.154.05 driver installed and you’re trying to install an incompatible 550.54.14 driver over it.
If you are on a vgpu vm, you can only install the grid driver version belonging to the underlying host’s vgpu install. driver 535 points to vgpu v16, driver 550 requires the host to run vgpu v17
https://docs.nvidia.com/grid/

Hello,

I am encountering the same issue, have you solved it?