RTX A6000 on Ubuntu 20.04 - SMI: No Devices Were Found

CPU - Ryzen 5600X
OS - Ubuntu 20.04
Linux Kernel - 5.15
GPU - RTX A6000
CUDA - 14.3

Hi All,

I am trying to get an RTX A6000 up and running on my system. From what I can tell, the driver is loaded and the system knows the display is there, but no device is found when I run SMI. I am not sure how to proceed and any guidance would be appreciated.

I have tried updating my linux kernel and downloading different versions of CUDA, but neither of these has been able to provide much of a change. Likewise, previously change the xorg.conf and blacklisting the nouveau driver have not been able to yield much results either.

gsworkstationuser05@scgsworkstation05:~$ sudo lshw -c display
*-display
description: VGA compatible controller
product: ASPEED Graphics Family
vendor: ASPEED Technology, Inc.
physical id: 0
bus info: pci@0000:29:00.0
version: 41
width: 32 bits
clock: 33MHz
capabilities: pm msi vga_controller bus_master cap_list
configuration: driver=ast latency=0
resources: irq:39 memory:c2000000-c2ffffff memory:c3000000-c301ffff ioport:d000(size=128) memory:c0000-dffff
*-display
description: 3D controller
product: NVIDIA Corporation
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:2d:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm bus_master cap_list
configuration: driver=nvidia latency=0
resources: irq:81 memory:c5000000-c5ffffff memory:c6000000-c7ffffff

gsworkstationuser05@scgsworkstation05:~$ dkms status
nvidia, 470.82.01, 5.15.13-051513-generic, x86_64: installed

gsworkstationuser05@scgsworkstation05:~$ nvidia-smi
No devices were found

gsworkstationuser05@scgsworkstation05:~$ nvidia-settings
Unable to init server: Could not connect: Connection refused

ERROR: The control display is undefined; please run nvidia-settings --help for usage information.

gsworkstationuser05@scgsworkstation05:~$ lspci -vv | grep -i nvidia
2d:00.0 3D controller: NVIDIA Corporation Device 2230 (rev a1)
Subsystem: NVIDIA Corporation Device 1459
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

nvidia-bug-report.log.gz (117.4 KB)

1 Like

Please either enable “above 4G decoding” and disable CSM in bios, then reinstall the OS in EFI mode (recommended) or use the DisplayMode Selector tool to set a 256MB BAR.
https://developer.nvidia.com/displaymodeselector

Thanks generix! That was able to resolve my issues.

Can you elaborate on what made you provide that recommendation? I didn’t notice anything in the log that stood out. Likewise, I didn’t think running SMI on an A6000 would be qualified as outside of the default configuration.

This:

[    0.201660] pci 0000:2d:00.0: BAR 1: no space for [mem size 0x1000000000 64bit pref]
[    0.201662] pci 0000:2d:00.0: BAR 1: failed to assign [mem size 0x1000000000 64bit pref]
[    0.201663] pci 0000:2d:00.0: BAR 8: no space for [mem size 0x1000000000 64bit pref]
[    0.201664] pci 0000:2d:00.0: BAR 8: failed to assign [mem size 0x1000000000 64bit pref]
[    0.201666] pci 0000:2d:00.0: BAR 3: assigned [mem 0xc0000000-0xc1ffffff 64bit pref]
[    0.201673] pci 0000:2d:00.0: BAR 10: no space for [mem size 0x40000000 64bit pref]
[    0.201674] pci 0000:2d:00.0: BAR 10: failed to assign [mem size 0x40000000 64bit pref]
[    0.201676] pci 0000:2d:00.0: BAR 0: no space for [mem size 0x01000000]
[    0.201677] pci 0000:2d:00.0: BAR 0: failed to assign [mem size 0x01000000]
[    0.201678] pci 0000:2d:00.0: BAR 7: no space for [mem size 0x00800000]
[    0.201679] pci 0000:2d:00.0: BAR 7: failed to assign [mem size 0x00800000]

The A6000 was set to require a 64GB address space to map the video memory which the CSM is unable to provide.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.