I’m currently having trouble with running my RTX A6000 GPUs. I have 8 of them in an external PCIe chassis connected to my server running Ubuntu 22.04 LTS. Whenever I start the server, my dmesg is filled with error messages of the PCI I/O region being invalid. I have ensured that the server is booting in UEFI with CSM off and Above 4G Decoding along with SR-IOV is enabled.
I have looked through the forums as well and tried adding pci=realloc and pci=realloc=yes in my kernel parameters but they did not work in my case as well. I have attached the nvidia-bug-report.sh output to see if anyone here can make any sense of this.
nvidia-bug-report.log.gz (9.0 MB)