Nvidia driver wont load on A10 on Ubuntu 22.04, in external Razor thunderbolt 3 chasis

I have a Intel NUC computer, with a external Razor Chroma thunderbolt 3 chassis.
In the Chassis i have a Nvidia A10 card, before all my problems i had a GT1030 and that worked super.
Now im only getting
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
So i reinstalled the system to start from fresh.
I have tried to install the following drivers from the ubuntu repo:
nvidia-driver-470
nvidia-driver-470-open
nvidia-driver-525
nvidia-driver-525-open
nvidia-driver-530

i have tried to follow multibel things but i can´t get the drivers to work.
everytime i have changed the drivers i have used:

sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'

output of lspci:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 05)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8) (rev 05)
00:01.2 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x4) (rev 05)
00:02.0 Display controller: Intel Corporation HD Graphics 630 (rev 04)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
00:15.0 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #0 (rev 31)
00:15.1 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #1 (rev 31)
00:15.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO I2C Controller #2 (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation HM170/QM170 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1c.1 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #2 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
00:1e.0 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Serial IO UART #0 (rev 31)
00:1f.0 ISA bridge: Intel Corporation HM175 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation CM238 HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 XT [Radeon RX Vega M GH] (rev c0)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 HDMI Audio
02:00.0 USB controller: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller
03:00.0 SD Host controller: O2 Micro, Inc. SD/MMC Card Reader Controller (rev 01)
05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
06:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
07:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
08:00.0 System peripheral: Intel Corporation JHL6540 Thunderbolt 3 NHI (C step) [Alpine Ridge 4C 2016] (rev 02)
09:00.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
0a:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02)
0b:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)

Im not running any WM or anything just a plain standard Ubuntu.

What do i need to do to install the drivers for a Nvidia A10 card?

nvidia-bug-report.log.gz (2.6 MB)

That won’t work. The A10 wants a very large BAR1 memory window which most if not all bioses won’t provide over thunderbolt. Furthermore, the A10 doesn’t have own fans but rely on the case to provide them, blowing air through it. The razer case isn’t built for that.

Thanks for the answer, but where can I see the requrements for BAR1 memory? since now i then have a A10 I can’t use.
I have fixed the cooling with fans.

You should first uninstall the nvidia driver since it floods the logs. After reboot, dmesg contains the info about the memory windows, e.g. like this:

[    1.032610] pci 0000:17:00.0: [10de:2235] type 00 class 0x030200
[    1.032619] pci 0000:17:00.0: reg 0x10: [mem 0x9c000000-0x9cffffff]
[    1.032626] pci 0000:17:00.0: reg 0x14: [mem 0x21e000000000-0x21efffffffff 64bit pref]
[    1.032634] pci 0000:17:00.0: reg 0x1c: [mem 0x21f000000000-0x21f001ffffff 64bit pref]

In that case, it’s an A40 which wants three windows of sizes 16M,64G,32M

Thanks for the answer, but that tells me that the card wants 3 blocks of memory but how could I see it before i brought the card?

I have now removed the A10 card and placed a A5000 card.
removed and reinstalled Nvidia-driver-525
and now nvidia-smi returns nice answer.
How can it be that a A10 card has other requirements, they are both 24Gb cards.

GT1030, RTX A5000 are graphics devices, A10 is a compute device.
Tesla style compute devices always had that requirement, they want to map their full memory into cpu address space.
There are also special Teslas which only want 256MB BAR1, but those probably have a special vbios, IDK.
Then there are newer hybrid devices which are switchable using the display mode switcher. The A10 is not.