Problems with memory management on boot with Tesla K80

I understand that I am running the Tesla K80 not in its preferred environment, but I have seen regular consumers use this product successfully so I tried it myself

I followed this, with an bit more up to date ubuntu version (Ubuntu 22.04.2 LTS) and the driver from the ubuntu additional drivers section(Nvidia server 470 driver), but the graphics cards memory is unassigned

/sbin/lspci -d "10de:*" -v -xxx

03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
	Subsystem: NVIDIA Corporation Device 106c
	Flags: fast devsel, IRQ 16
	Memory at <unassigned> (64-bit, prefetchable) [disabled]
	Memory at <unassigned> (64-bit, prefetchable) [disabled]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] #19
	Kernel modules: nouveau, nvidia_drm, nvidia

4G is enabled and secure boot disabled, but I had the feeling the above 4G decoding did not quite work so I added

pci=nocrs

After that the memory looked fine
/usr/bin/lspci -d “10de:*” -v -xxx

03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
	Flags: fast devsel, IRQ 16
	Memory at 400000000 (64-bit, prefetchable) [size=16G]
	Memory at 800000000 (64-bit, prefetchable) [size=32M]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

bug-report-without-realloc.log.gz (880.7 KB)

But the problem NVIDIA-persistant still did not work and there was no nvidia* in /dev/
and nvidia-smi could not find any device.

In the logs it told me to enable pci=realloc so I did but this did not solve my problem. The nvidia* devices appeared in /dev/, but the memory was messy again.

03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
	Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at f0000000 (32-bit, non-prefetchable) [size=16M]
	Memory at <unassigned> (64-bit, prefetchable)
	Memory at <unassigned> (64-bit, prefetchable)
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [420] Advanced Error Reporting
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

nvidia-bug-report-with-realloc.log.gz (888.4 KB)

The only other graphics driver I have installed are the i915, but since the Tesla K80 does not have a graphics output I prefer them enabled.