A6000 Nvidia Driver Installation Error (Ubuntu Server 24, Kernel 6.8)

I have been a few days trying to install drivers for A6000 in Ubuntu Server 24. I have tried: 535, 545, 500 and 550; and I always get the same error running nvidia-smi:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The procedure I follow to install the driver is:

sudo apt install nvidia-driver-5XX
sudo update-initramfs -u
sudo reboot

Then to uninstall and test another driver I use:

sudo apt remove --purge 'nvidia-*'
sudo apt autoremove
sudo apt autoclean

I have tried many other things and my PC has:

  • Secure Boot disabled
  • Deactivated CSM
  • Activated the “above 4G decoding”

I also attach the bug report. To me it looked the same for every installation I did but I can’t really understand what is going on there.

nvidia-bug-report.log.gz (78.6 KB)

The kernel modules are missing. Please post the output of
dpkg -l |grep nvidia
and
dkms status

The output from dpkg -l | grep nvidia is:

ii  libnvidia-cfg1-545:amd64              545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-545                  545.29.06-0ubuntu0~gpu24.04.1           all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-545:amd64           545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA libcompute package
ii  libnvidia-decode-545:amd64            545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-545:amd64            545.29.06-0ubuntu0~gpu24.04.1           amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-545:amd64             545.29.06-0ubuntu0~gpu24.04.1           amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-545:amd64              545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-545:amd64                545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-ml-dev:amd64                12.0.140~12.0.1-4build4                 amd64        NVIDIA Management Library (NVML) development files
ii  nvidia-compute-utils-545              545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA compute utilities
ii  nvidia-cuda-dev:amd64                 12.0.146~12.0.1-4build4                 amd64        NVIDIA CUDA development files
ii  nvidia-cuda-gdb                       12.0.140~12.0.1-4build4                 amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                   12.0.140~12.0.1-4build4                 amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc               12.0.1-4build4                          all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-dkms-545                       545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA DKMS package
ii  nvidia-driver-545                     545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA driver metapackage
ii  nvidia-firmware-545-545.29.06         545.29.06-0ubuntu0~gpu24.04.1           amd64        Firmware files used by the kernel module
ii  nvidia-kernel-common-545              545.29.06-0ubuntu0~gpu24.04.1           amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-545              545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA kernel source package
ii  nvidia-opencl-dev:amd64               12.0.140~12.0.1-4build4                 amd64        NVIDIA OpenCL development files
ii  nvidia-prime                          0.8.17.2                                all          Tools to enable NVIDIA's Prime
ii  nvidia-profiler                       12.0.146~12.0.1-4build4                 amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                       510.47.03-0ubuntu4                      amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-545                      545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA driver support binaries
ii  nvidia-visual-profiler                12.0.146~12.0.1-4build4                 amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  screen-resolution-extra               0.18.3                                  all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-545         545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA binary Xorg driver

And for dkms status:
nvidia/545.29.06: added

Are they loaded and in use? Should look something like this:

$ lsmod|grep nvidia
nvidia_uvm           5251072  2
nvidia_drm            126976  258
nvidia_modeset       1622016  24 nvidia_drm
video                  77824  1 nvidia_modeset
nvidia              60981248  1064 nvidia_uvm,nvidia_modeset

$ lsmod|grep nvidia is the command that doesn’t show anything. I have tried several times but never got a response.

Driver 545 is installed but this won’t compile on newer kernels. Please either downgrade to 535 or upgrade to 550/555.

I have installed 550 and these are the outputs of the commands:

$ dpkg -l | grep nvidia
ii  libnvidia-cfg1-550:amd64              550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-550                  550.78-0ubuntu0~gpu24.04.1              all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-545:amd64           545.29.06-0ubuntu0~gpu24.04.1           amd64        NVIDIA libcompute package
ii  libnvidia-compute-550:amd64           550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA libcompute package
ii  libnvidia-decode-550:amd64            550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-550:amd64            550.78-0ubuntu0~gpu24.04.1              amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-550:amd64             550.78-0ubuntu0~gpu24.04.1              amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-550:amd64              550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-550:amd64                550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  nvidia-compute-utils-550              550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA compute utilities
ii  nvidia-dkms-550                       550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA DKMS package
ii  nvidia-driver-550                     550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA driver metapackage
ii  nvidia-firmware-550-550.78            550.78-0ubuntu0~gpu24.04.1              amd64        Firmware files used by the kernel module
ii  nvidia-kernel-common-550              550.78-0ubuntu0~gpu24.04.1              amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-550              550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA kernel source package
ii  nvidia-prime                          0.8.17.2                                all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                       510.47.03-0ubuntu4                      amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-550                      550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA driver support binaries
ii  screen-resolution-extra               0.18.3                                  all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-550         550.78-0ubuntu0~gpu24.04.1              amd64        NVIDIA binary Xorg driver
$ dkms status
nvidia/550.78: added

$lsmod|grep nvidia nothing.

And I attach the new debug log.

I have tried these driver versions before with no success.
nvidia-bug-report.log.gz (80.3 KB)

Please try reinstalling the kernel headers
sudo apt install --reinstall linux-headers-$(uname -r)
and post any errors displayed.

Everything seems to work:

$  sudo apt install --reinstall linux-headers-$(uname -r)
[sudo] password for javierj: 
Leyendo lista de paquetes... Hecho
Creando árbol de dependencias... Hecho
Leyendo la informaciĂłn de estado... Hecho
Se instalarán los siguientes paquetes adicionales:
  linux-headers-6.8.0-31
Se instalarán los siguientes paquetes NUEVOS:
  linux-headers-6.8.0-31 linux-headers-6.8.0-31-generic
0 actualizados, 2 nuevos se instalarán, 0 para eliminar y 0 no actualizados.
Se necesita descargar 17,5 MB de archivos.
Se utilizarán 114 MB de espacio de disco adicional después de esta operación.
ÂżDesea continuar? [S/n] 
Des:1 http://es.archive.ubuntu.com/ubuntu noble/main amd64 linux-headers-6.8.0-31 all 6.8.0-31.31 [13,6 MB]
Des:2 http://es.archive.ubuntu.com/ubuntu noble/main amd64 linux-headers-6.8.0-31-generic amd64 6.8.0-31.31 [3.866 kB]
Descargados 17,5 MB en 1s (16,2 MB/s)                     
Seleccionando el paquete linux-headers-6.8.0-31 previamente no seleccionado.
(Leyendo la base de datos ... 94650 ficheros o directorios instalados actualmente.)
Preparando para desempaquetar .../linux-headers-6.8.0-31_6.8.0-31.31_all.deb ...
Desempaquetando linux-headers-6.8.0-31 (6.8.0-31.31) ...
Seleccionando el paquete linux-headers-6.8.0-31-generic previamente no seleccionado.
Preparando para desempaquetar .../linux-headers-6.8.0-31-generic_6.8.0-31.31_amd64.deb ...
Desempaquetando linux-headers-6.8.0-31-generic (6.8.0-31.31) ...
Configurando linux-headers-6.8.0-31 (6.8.0-31.31) ...
Configurando linux-headers-6.8.0-31-generic (6.8.0-31.31) ...
/etc/kernel/header_postinst.d/dkms:
 * dkms: running auto installation service for kernel 6.8.0-31-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der

Building module:
Cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=6.8.0-31-generic IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/6.8.0-31-generic/build LD=/usr/bin/ld.bfd CONFIG_X86_KERNEL_IBT= modules........
Signing module /var/lib/dkms/nvidia/550.78/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/550.78/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/550.78/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/550.78/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/550.78/build/nvidia-peermem.ko
Cleaning build area...

nvidia.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-31-generic/updates/dkms/

nvidia-modeset.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-31-generic/updates/dkms/

nvidia-drm.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-31-generic/updates/dkms/

nvidia-uvm.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-31-generic/updates/dkms/

nvidia-peermem.ko.zst:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/6.8.0-31-generic/updates/dkms/
depmod...
dkms autoinstall on 6.8.0-31-generic/x86_64 succeeded for nvidia
 * dkms: autoinstall for kernel 6.8.0-31-generic
   ...done.
Scanning processes...                                                                                                                                                                                                                   
Scanning processor microcode...                                                                                                                                                                                                         
Scanning linux images...                                                                                                                                                                                                                

Running kernel seems to be up-to-date.

The processor microcode seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.

But now the message error from nvidia-smi is different:

$ nvidia-smi
No devices were found

And lsmod shows something:

$ lsmod|grep nvidia
nvidia_uvm           4931584  0
nvidia_drm            122880  0
nvidia_modeset       1355776  1 nvidia_drm
nvidia              54239232  2 nvidia_uvm,nvidia_modeset
video                  73728  3 asus_wmi,asus_nb_wmi,nvidia_modeset

I have rebooted and nvidia-smi works!! Ty so much!

$ nvidia-smi
Mon May 27 14:11:02 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX A6000               Off |   00000000:01:00.0 Off |                  Off |
| 30%   36C    P8             10W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.