Nvidia-smi "No devices were found"

What I’ve tried:

  • adding pci=nocrs,realloc,rom to /etc/default/grub
    • this helped get me from “ERROR: could not insert ‘nvidia’: No such device” to “No devices were found”.
  • echo 1 > /sys/bus/pci/devices/0000\:00\:01.0/remove ; echo 1 > /sys/bus/pci/rescan
    • Thread 197768

Hardware I have:

  • 2013 Mac Pro “Trashcan”
  • Ubuntu 20.04 with all packages updated
  • eGPU bridge 5e:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06)
  • NVIDIA RTX A5000

Why can’t nvidia-smi see my GPU even though /dev/nvidia0 exists? :(

I’ve attached the nvidia-bug-report.log.gz to this post.
nvidia-bug-report.log.gz (128.5 KB)

Also, these messages keep repeating in my dmesg log, regardless of which driver I install (470, 515, 525):

[  709.843811] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.161.03  Wed Oct 19 00:10:36 UTC 2022
[  709.862647] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.161.03  Wed Oct 19 00:05:15 UTC 2022
[  709.869723] [drm] [nvidia-drm] [GPU ID 0x00001900] Loading driver
[  709.869727] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:19:00.0 on minor 2
[  709.896517] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[  709.900037] nvidia-uvm: Loaded the UVM driver, major device number 504.
[  709.919576] [drm] [nvidia-drm] [GPU ID 0x00001900] Unloading driver
[  709.960842] nvidia-modeset: Unloading
[  709.989640] nvidia-uvm: Unloaded the UVM driver.
[  710.026059] nvidia-nvlink: Unregistered the Nvlink Core, major device number 507
[  710.653570] nvidia-nvlink: Nvlink Core is being initialized, major device number 507

[  710.713327] nvidia 0000:19:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=none
[  712.220813] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.161.03  Wed Oct 19 00:10:36 UTC 2022
[  712.252962] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.161.03  Wed Oct 19 00:05:15 UTC 2022
[  712.261056] [drm] [nvidia-drm] [GPU ID 0x00001900] Loading driver
[  712.261060] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:19:00.0 on minor 2
[  712.290075] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[  712.293589] nvidia-uvm: Loaded the UVM driver, major device number 504.
[  712.315195] [drm] [nvidia-drm] [GPU ID 0x00001900] Unloading driver
[  712.344952] nvidia-modeset: Unloading
[  712.383611] nvidia-uvm: Unloaded the UVM driver.
[  712.406006] nvidia-nvlink: Unregistered the Nvlink Core, major device number 507

Please check if this helps:
https://forums.developer.nvidia.com/t/k-ubuntu-22-10-not-booting-kernel-oops-for-driver-450-with-egpu/235008/3?u=generix

Thanks for the reply! I’ll try a version of the 470 driver in the range you mentioned. Would it work with an ampere/A series card?

Support for the RTX 5000 (desktop) was added in 460.73.01

1 Like

Awesome, thanks! I’ll try that and report back.

I attempted using driver 470, to no avail.

dmesg:

[  176.793973] nvidia: probe of 0000:82:00.0 failed with error -1
[  176.794059] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  176.794064] NVRM: None of the NVIDIA devices were initialized.
[  176.794569] nvidia-nvlink: Unregistered the Nvlink Core, major device number 511
[  177.429014] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[  177.429037] NVRM: request_mem_region failed for 0M @ 0x0. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[  177.446971] nvidia: probe of 0000:82:00.0 failed with error -1
[  177.447012] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  177.447014] NVRM: None of the NVIDIA devices were initialized.
[  177.447281] nvidia-nvlink: Unregistered the Nvlink Core, major device number 511
[  178.078269] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[  178.078282] NVRM: request_mem_region failed for 0M @ 0x0. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[  178.092757] nvidia: probe of 0000:82:00.0 failed with error -1
[  178.092789] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  178.092792] NVRM: None of the NVIDIA devices were initialized.
[  178.093034] nvidia-nvlink: Unregistered the Nvlink Core, major device number 511

I will attempt using an -open driver and setting the kernel parameter nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 .

I have attempted using the -open driver with the listed kernel parameter above. It didn’t work and I’m not sure what to try next.

I’m using Ubuntu 24.04 server (so no GUI).

Image on tty0, dmesg below:

dmesg:

[  283.010648] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[  283.010660] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.050719] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR2 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.050728] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR3 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.050733] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR4 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.050737] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR5 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.051675] nvidia 0000:82:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
[  283.051731] NVRM: The NVIDIA GPU 0000:82:00.0
               NVRM: (PCI ID: 10de:24b0) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[  283.051978] nvidia: probe of 0000:82:00.0 failed with error -1
[  283.052022] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  283.052024] NVRM: None of the NVIDIA devices were initialized.
[  283.052330] nvidia-nvlink: Unregistered Nvlink Core, major device number 510
[  283.373931] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[  283.373952] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.414689] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR2 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.414693] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR3 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.414695] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR4 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.414697] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR5 is 0M @ 0x0 (PCI:0000:82:00.0)
[  283.414715] nvidia 0000:82:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=none
[  283.414773] NVRM: The NVIDIA GPU 0000:82:00.0
               NVRM: (PCI ID: 10de:24b0) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[  283.415081] nvidia: probe of 0000:82:00.0 failed with error -1
[  283.415137] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  283.415141] NVRM: None of the NVIDIA devices were initialized.
[  283.415709] nvidia-nvlink: Unregistered Nvlink Core, major device number 510

@generix sorry for the late reply, bumping this thread.

Latest attempt was with the 515-open driver:

[   83.535988] NVRM: nvAssertFailedNoLog: Assertion failed: pKernelBus->pciBars[BUS_BAR_1] != 0 @ kern_bus_gm107.c:3861
[   83.536003] NVRM: nvCheckOkFailedNoLog: Check failed: Generic Error: Invalid state [NV_ERR_INVALID_STATE] (0x00000040) returned from kbusInitBarsBaseInfo_HAL(pKernelBus) @ kern_bus.c:77
[   83.536070] NVRM: osInitNvMapping: *** Cannot attach gpu
[   83.536075] NVRM: RmInitAdapter: osInitNvMapping failed, bailing out of RmInitAdapter
[   83.536087] NVRM: GPU 0000:82:00.0: RmInitAdapter failed! (0x22:0x40:631)
[   83.537269] NVRM: GPU 0000:82:00.0: rm_init_adapter failed, device minor number 0