Hi! I switched to an A6000 from a 1050ti in our server. I ran into this thread RTX A6000 on Ubuntu 20.04 - SMI: No Devices Were Found
with very similar issue, especially the
[ 0.201660] pci 0000:2d:00.0: BAR 1: no space for [mem size 0x1000000000 64bit pref]
type of errors in the
Consequently, I:
- Switched from CSM to UEFI (or better said, deactivated CSM capabilities, secureboot is off)
- activated the “above 4G decoding”
- used displaymodeselector to switch to the 256mb setting (but imho this is default and was activated already).
I further tried both 515 & 525 drivers (always completely purged drivers, installed with apt)
Any help is greatly appreciated!!
edit: Typically I use the server headless, but on a plugged in monitor sometimes this error appears:
module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000000832a0eef, val ffffffffc337081e module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000000832a0eef, val ffffffffc337081e module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000009a257768, val ffffffffc68f881e
Some details:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.`
sudo lshw -c display
*-display
description: VGA compatible controller
product: GA102GL [RTX A6000]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:21:00.0
logical name: /dev/fb0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller cap_list fb
configuration: depth=32 latency=0 mode=1920x1080 visual=truecolor xres=1920 yres=1080
resources: iomemory:2bf0-2bef iomemory:2bf0-2bef memory:d0000000-d0ffffff memory:2bf70000000-2bf7fffffff memory:2bf68000000-2bf69ffffff ioport:2000(size=128) memory:d1000000-d107ffff
*-display
description: VGA compatible controller
product: ASPEED Graphics Family
vendor: ASPEED Technology, Inc.
physical id: 0
bus info: pci@0000:42:00.0
logical name: /dev/fb0
version: 41
width: 32 bits
clock: 33MHz
capabilities: pm msi vga_controller cap_list rom fb
configuration: depth=32 driver=ast latency=0 resolution=1920,1080
resources: irq:273 memory:d9000000-d9ffffff memory:da000000-da01ffff ioport:4000(size=128) memory:c0000-dffff
dkms status
iser/4.9, 5.4.0-135-generic, x86_64: installed
isert/4.9, 5.4.0-135-generic, x86_64: installed
kernel-mft-dkms/4.15.1, 5.15.0-56-generic, x86_64: installed
kernel-mft-dkms/4.15.1, 5.4.0-135-generic, x86_64: installed
knem/1.1.3.90mlnx1, 5.15.0-56-generic, x86_64: installed
knem/1.1.3.90mlnx1, 5.4.0-135-generic, x86_64: installed
mlnx-ofed-kernel/4.9, 5.4.0-135-generic, x86_64: installed
mlnx-rdma-rxe/4.9, 5.4.0-135-generic, x86_64: installed
nvidia/525.60.11, 5.15.0-56-generic, x86_64: installed
rshim/1.18, 5.15.0-56-generic, x86_64: installed
rshim/1.18, 5.4.0-135-generic, x86_64: installed
srp/4.9, 5.4.0-135-generic, x86_64: installed
lspci -vv | grep -i nvidia
21:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GA102GL [RTX A6000]
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
21:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
Subsystem: NVIDIA Corporation GA102 High Definition Audio Controller
`
nvidia-bug-report.log.gz (154.6 KB)