Device not found (Ubuntu 20.04 / Dell Precision / RTX A4000 / RmInitAdapter failed)

Hello,

I am unable to make my GPU work on Ubuntu 20.04 LTS.
The GPU is a RTX A4000

Here are my bug report and kern.log
The latter says:
Feb 8 07:35:47 loicus-DA kernel: [ 288.919473] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0xffff:1401)
Feb 8 07:35:47 loicus-DA kernel: [ 288.919576] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 8 07:35:54 loicus-DA kernel: [ 296.096457] NVRM: Xid (PCI:0000:01:00): 79, pid=5156, GPU has fallen off the bus.
Feb 8 07:35:54 loicus-DA kernel: [ 296.096508] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Feb 8 07:35:54 loicus-DA kernel: [ 296.097573] NVRM: A GPU crash dump has been created. If possible, please run
Feb 8 07:35:54 loicus-DA kernel: [ 296.097573] NVRM: nvidia-bug-report.sh as root to collect this data before
Feb 8 07:35:54 loicus-DA kernel: [ 296.097573] NVRM: the NVIDIA kernel module is unloaded.

I tried to reinstall everything from scratch, reinstall the drivers in several different ways, etc…
Nothing is working… I suspect the GPU is dead, but I’d be thankful to get a confirmation

nvidia-bug-report.log (1.0 MB)
kern.log (139.0 KB)

Generix, please Help!!!

Thanks in advance,
Loic

Since this is a laptop, the gpu is not necessarily broken. It falls off the bus first which would point to a power management/bus/kernel problem. Please try

  • updating bios
  • setting kernel parameter intel_idle.max_cstate=1
  • use a different kernel
    The ubuntu 5.13 kernel was released with a lot of bugs, please check if you have a 5.11 kernel available in grub menu or try using the liquorix kernel ppa:
    https://launchpad.net/~damentz/+archive/ubuntu/liquorix
  • updating bios:
    It tried to do this, is this what you mean ?
loicus@loicus-DA:~$ sudo fwupdmgr refresh --force
Updating lvfs
Downloading…             [***************************************]
Successfully downloaded new metadata: 1 local device supported
loicus@loicus-DA:~$ sudo fwupdmgr update
Devices with no available firmware updates: 
 • PM9A1 NVMe Samsung 1024GB
 • PM9A1 NVMe Samsung 1024GB
 • UEFI Device Firmware
 • UEFI Device Firmware
 • UEFI dbx
Devices with the latest available firmware version:
 • System Firmware

I’ve set in my /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=1"

Then I ran “sudo update-grub”

But it doesn’t change anything.

So I went further and install the liquorix kernel.
This leads to the following message from “nvidia-smi”:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

That sounds like a step in the right direction to me, but we are not yet there :-)

I tried to uninstall and reinstall all nvidia stuff, but it didn’t help

sudo apt purge nvidia*
sudo ubuntu-drivers autoinstall

here is the new bug-report
nvidia-bug-report.log (471.4 KB)

Thanks for your help, it’s really appreciated!

Seems the kernel modules didn’t compile, please reinstall kernel headers
sudo apt install linux-headers-$(uname -r)
then post the output of
dkms status

loicus@loicus-DA:~$ sudo apt --reinstall install linux-headers-$(uname -r)
[sudo] password for loicus: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  apt-clone archdetect-deb dmraid gir1.2-timezonemap-1.0 gir1.2-xkl-1.0 glib-networking:i386 gstreamer1.0-plugins-base:i386 kpartx kpartx-boot libapparmor1:i386 libargon2-1:i386 libasyncns0:i386
  libbrotli1:i386 libcairo2:i386 libcap2:i386 libcdparanoia0:i386 libdbus-1-3:i386 libdebian-installer4 libdevmapper1.02.1:i386 libdmraid1.0.0.rc16 libflac8:i386 libfontconfig1:i386 libfreetype6:i386
  libglib2.0-0:i386 libgmp10:i386 libgnutls30:i386 libgomp1:i386 libgssapi-krb5-2:i386 libgstreamer-plugins-base1.0-0:i386 libgstreamer1.0-0:i386 libhogweed5:i386 libice6:i386 libicu66:i386 libip4tc2:i386
  libjack-jackd2-0:i386 libjson-c4:i386 libjson-glib-1.0-0:i386 libk5crypto3:i386 libkeyutils1:i386 libkrb5-3:i386 libkrb5support0:i386 libltdl7:i386 libnettle7:i386 libogg0:i386 libopus0:i386
  liborc-0.4-0:i386 libp11-kit0:i386 libpixman-1-0:i386 libpng16-16:i386 libproxy1v5:i386 libpsl5:i386 libsamplerate0:i386 libseccomp2:i386 libsm6:i386 libsnapd-glib1:i386 libsndfile1:i386 libsoup2.4-1:i386
  libsoxr0:i386 libspeexdsp1:i386 libsqlite3-0:i386 libssl1.1:i386 libtasn1-6:i386 libtdb1:i386 libtheora0:i386 libtimezonemap-data libtimezonemap1 libvisual-0.4-0:i386 libvorbis0a:i386 libvorbisenc2:i386
  libwebrtc-audio-processing1:i386 libwrap0:i386 libxcb-render0:i386 libxml2:i386 libxrender1:i386 libxtst6:i386 linux-headers-5.13.0-1010-oem linux-image-5.13.0-1010-oem linux-modules-5.13.0-1010-oem
  linux-oem-5.13-headers-5.13.0-1010 python3-icu python3-pam rdate
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 0 B/12,0 MB of archives.
After this operation, 0 B of additional disk space will be used.
(Reading database ... 218744 files and directories currently installed.)
Preparing to unpack .../linux-headers-5.16.0-7.2-liquorix-amd64_5.16-5ubuntu1~focal_amd64.deb ...
Unpacking linux-headers-5.16.0-7.2-liquorix-amd64 (5.16-5ubuntu1~focal) over (5.16-5ubuntu1~focal) ...
Setting up linux-headers-5.16.0-7.2-liquorix-amd64 (5.16-5ubuntu1~focal) ...
/etc/kernel/header_postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.16.0-7.2-liquorix-amd64
   ...done.
loicus@loicus-DA:~$ dkms status
loicus@loicus-DA:~$ 

Please post the output of
dpkg -l |grep nvidia

loicus@loicus-DA:~$ dpkg -l |grep nvidia
ii  libnvidia-cfg1-510:amd64                      510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-510                          510.47.03-0ubuntu0.20.04.1            all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-470:amd64                   470.103.01-0ubuntu0.20.04.1           amd64        NVIDIA libcompute package
rc  libnvidia-compute-470-server:amd64            470.103.01-0ubuntu0.20.04.1           amd64        NVIDIA libcompute package
ii  libnvidia-compute-510:amd64                   510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA libcompute package
ii  libnvidia-compute-510:i386                    510.47.03-0ubuntu0.20.04.1            i386         NVIDIA libcompute package
ii  libnvidia-decode-510:amd64                    510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-510:i386                     510.47.03-0ubuntu0.20.04.1            i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-510:amd64                    510.47.03-0ubuntu0.20.04.1            amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-510:i386                     510.47.03-0ubuntu0.20.04.1            i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-510:amd64                     510.47.03-0ubuntu0.20.04.1            amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-510:amd64                      510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-510:i386                       510.47.03-0ubuntu0.20.04.1            i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-510:amd64                        510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-510:i386                         510.47.03-0ubuntu0.20.04.1            i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  linux-modules-nvidia-510-5.13.0-1029-oem      5.13.0-1029.36+1                      amd64        Linux kernel nvidia modules for version 5.13.0-1029
ii  linux-modules-nvidia-510-oem-20.04c           5.13.0-1029.36+1                      amd64        Extra drivers for nvidia-510 for the oem-20.04c flavour
rc  linux-objects-nvidia-470-5.11.0-1028-aws      5.11.0-1028.31~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.11.0-1028 (objects)
rc  linux-objects-nvidia-470-5.11.0-1028-azure    5.11.0-1028.31~20.04.2+1              amd64        Linux kernel nvidia modules for version 5.11.0-1028 (objects)
rc  linux-objects-nvidia-470-5.11.0-1028-oracle   5.11.0-1028.31~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.11.0-1028 (objects)
rc  linux-objects-nvidia-470-5.11.0-1029-gcp      5.11.0-1029.33~20.04.3+1              amd64        Linux kernel nvidia modules for version 5.11.0-1029 (objects)
rc  linux-objects-nvidia-470-5.13.0-1012-aws      5.13.0-1012.13~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.13.0-1012 (objects)
rc  linux-objects-nvidia-470-5.13.0-1013-azure    5.13.0-1013.15~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.13.0-1013 (objects)
rc  linux-objects-nvidia-470-5.13.0-1013-gcp      5.13.0-1013.16~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.13.0-1013 (objects)
rc  linux-objects-nvidia-470-5.13.0-1016-oracle   5.13.0-1016.20~20.04.1+1              amd64        Linux kernel nvidia modules for version 5.13.0-1016 (objects)
rc  linux-objects-nvidia-470-5.13.0-1029-oem      5.13.0-1029.36+1                      amd64        Linux kernel nvidia modules for version 5.13.0-1029 (objects)
rc  linux-objects-nvidia-470-5.13.0-28-generic    5.13.0-28.31~20.04.1+2                amd64        Linux kernel nvidia modules for version 5.13.0-28 (objects)
rc  linux-objects-nvidia-470-5.13.0-28-lowlatency 5.13.0-28.31~20.04.1+2                amd64        Linux kernel nvidia modules for version 5.13.0-28 (objects)
rc  linux-objects-nvidia-470-5.4.0-1062-oracle    5.4.0-1062.66+1                       amd64        Linux kernel nvidia modules for version 5.4.0-1062 (objects)
rc  linux-objects-nvidia-470-5.4.0-1063-gcp       5.4.0-1063.67+1                       amd64        Linux kernel nvidia modules for version 5.4.0-1063 (objects)
rc  linux-objects-nvidia-470-5.4.0-1064-aws       5.4.0-1064.67+1                       amd64        Linux kernel nvidia modules for version 5.4.0-1064 (objects)
rc  linux-objects-nvidia-470-5.4.0-1068-azure     5.4.0-1068.71+1                       amd64        Linux kernel nvidia modules for version 5.4.0-1068 (objects)
rc  linux-objects-nvidia-470-5.4.0-99-generic     5.4.0-99.112+1                        amd64        Linux kernel nvidia modules for version 5.4.0-99 (objects)
rc  linux-objects-nvidia-470-5.4.0-99-lowlatency  5.4.0-99.112+1                        amd64        Linux kernel nvidia modules for version 5.4.0-99 (objects)
ii  linux-objects-nvidia-510-5.13.0-1029-oem      5.13.0-1029.36+1                      amd64        Linux kernel nvidia modules for version 5.13.0-1029 (objects)
ii  linux-signatures-nvidia-5.13.0-1029-oem       5.13.0-1029.36+1                      amd64        Linux kernel signatures for nvidia modules for version 5.13.0-1029-oem
rc  nvidia-compute-utils-470-server               470.103.01-0ubuntu0.20.04.1           amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-510                      510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA compute utilities
rc  nvidia-dkms-470-server                        470.103.01-0ubuntu0.20.04.1           amd64        NVIDIA DKMS package
ii  nvidia-driver-510                             510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA driver metapackage
rc  nvidia-kernel-common-470-server               470.103.01-0ubuntu0.20.04.1           amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-510                      510.47.03-0ubuntu0.20.04.1            amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-510                      510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA kernel source package
ii  nvidia-prime                                  0.8.16~0.20.04.1                      all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                               470.57.01-0ubuntu0.20.04.2            amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-510                              510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                       0.18build1                            all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-510                 510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA binary Xorg driver

I imagine that should purge all the stuff that is not 510 ?

Yes, it’s a wild mix of 470-server and 510, neither driver being complete. rather remove everything *nvidia* and reinstall using Software&Updates application.

loicus@loicus-DA:~$ dpkg -l |grep nvidia
ii  libnvidia-cfg1-510:amd64                   510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-510                       510.47.03-0ubuntu0.20.04.1            all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-510:amd64                510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA libcompute package
ii  libnvidia-compute-510:i386                 510.47.03-0ubuntu0.20.04.1            i386         NVIDIA libcompute package
ii  libnvidia-decode-510:amd64                 510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-510:i386                  510.47.03-0ubuntu0.20.04.1            i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-510:amd64                 510.47.03-0ubuntu0.20.04.1            amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-510:i386                  510.47.03-0ubuntu0.20.04.1            i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-510:amd64                  510.47.03-0ubuntu0.20.04.1            amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-510:amd64                   510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-510:i386                    510.47.03-0ubuntu0.20.04.1            i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-510:amd64                     510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-510:i386                      510.47.03-0ubuntu0.20.04.1            i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  nvidia-compute-utils-510                   510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA compute utilities
ii  nvidia-dkms-510                            510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA DKMS package
ii  nvidia-driver-510                          510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA driver metapackage
ii  nvidia-kernel-common-510                   510.47.03-0ubuntu0.20.04.1            amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-510                   510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA kernel source package
ii  nvidia-prime                               0.8.16~0.20.04.1                      all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            470.57.01-0ubuntu0.20.04.2            amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-510                           510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                    0.18build1                            all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-510              510.47.03-0ubuntu0.20.04.1            amd64        NVIDIA binary Xorg driver

loicus@loicus-DA:~$ dkms status
nvidia, 510.47.03, 5.16.0-7.2-liquorix-amd64, x86_64: installed

I rebooted at this point

loicus@loicus-DA:~$ sudo nvidia-smi
No devices were found

Please create a new nvidia-bug-report.log

nvidia-bug-report.log (1.4 MB)

Same error.
I guess you’ll have to cross-check for the gpu to be dead by installing Windows now.

arf… I guess I can pick any version ?

Just use a Windows 10 image from Microsoft, fetch nvidia drivers from dell website, install and check if Windows device manager reports “Code 43”.

I have indeed a code43 after installing latest driver and rebooting.
If I try to open the nvidia control panel, nothing happens and when I try to open the NVIDIA RTX Desktop Manager I get an error saying that I should at least have a RTX GPU

I guess this confirms that the GPU is dead ?

Thanks for helping
Loic

Yes, it’s dead, sorry. Hope your device is still under warranty.

The computer (and its GPU) is brand new…
what a shame that Dell sold me this…

Thanks a lot for your help generix,
I will now start hassling the commercial team at Dell