I tried pci=realloc and pci=realloc=off but I could not detect changes.
logs →
ACPI Warning: SystemIO range 0x0000000000001828-0x000000000000182F conflicts with OpRegion 0x0000000000001800-0x000000000000187F (\PMIO) (20210730/utaddress-213)
[ 3.904360] ACPI: OSL: Resource conflict; ACPI support missing from driver?
[ 3.905864] ACPI Warning: SystemIO range 0x0000000000001C40-0x0000000000001C4F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C7F (\_GPE.GPBX) (20210730/utaddress-213)
[ 3.907424] ACPI Warning: SystemIO range 0x0000000000001C40-0x0000000000001C4F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20210730/utaddress-213)
[ 3.908986] ACPI: OSL: Resource conflict; ACPI support missing from driver?
[ 3.910532] ACPI Warning: SystemIO range 0x0000000000001C30-0x0000000000001C3F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C7F (\_GPE.GPBX) (20210730/utaddress-213)
[ 3.910644] input: PC Speaker as /devices/platform/pcspkr/input/input8
[ 3.912148] ACPI Warning: SystemIO range 0x0000000000001C30-0x0000000000001C3F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C3F (\GPRL) (20210730/utaddress-213)
[ 3.912153] ACPI Warning: SystemIO range 0x0000000000001C30-0x0000000000001C3F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20210730/utaddress-213)
[ 3.912156] ACPI: OSL: Resource conflict; ACPI support missing from driver?
[ 3.912158] ACPI Warning: SystemIO range 0x0000000000001C00-0x0000000000001C2F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C7F (\_GPE.GPBX) (20210730/utaddress-213)
[ 3.912162] ACPI Warning: SystemIO range 0x0000000000001C00-0x0000000000001C2F conflicts with OpRegion 0x0000000000001C00-0x0000000000001C3F (\GPRL) (20210730/utaddress-213)
[ 3.912165] ACPI Warning: SystemIO range 0x0000000000001C00-0x0000000000001C2F conflicts with OpRegion 0x0000000000001C00-0x0000000000001FFF (\GPR) (20210730/utaddress-213)
On the Linux mailing list they sad:
Unless you need to use anything on SMBus (hardware sensors, essentially)
you don’t have to worry about that one. It means that the kernel has
detected that the BIOS may potentially access the SMBus controller which
may conflict with usage of the controller from within the OS.
Could this be a problem, because the K80 uses temperature sensors, and could shut down if above 95 Celsius and so maybe completely shut down if the sensors can’t read?
logs →
nvidia: module verification failed: signature and/or required key missing - tainting kernel
~> mokutil --sb-state
SecureBoot disabled
Platform is in Setup Mode
logs →
2023-06-05T22:28:44.104044+02:00 localhost kernel: [ 46.951725][T13740] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
2023-06-05T22:28:44.104045+02:00 localhost kernel: [ 46.951729][T13740] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2023-06-05T22:28:44.104046+02:00 localhost kernel: [ 46.951729][T13740] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:03:00.0)
2023-06-05T22:28:44.104046+02:00 localhost kernel: [ 46.952168][T13740] NVRM: The system BIOS may have misconfigured your GPU.
2023-06-05T22:28:44.104048+02:00 localhost kernel: [ 46.952218][T13740] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2023-06-05T22:28:44.104048+02:00 localhost kernel: [ 46.952218][T13740] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
2023-06-05T22:28:44.104049+02:00 localhost kernel: [ 46.952220][T13740] NVRM: The system BIOS may have misconfigured your GPU.
2023-06-05T22:28:44.104050+02:00 localhost kernel: [ 46.952237][T13740] NVRM: The NVIDIA probe routine failed for 2 device(s).
2023-06-05T22:28:44.104050+02:00 localhost kernel: [ 46.952239][T13740] NVRM: None of the NVIDIA devices were initialized.
2023-06-05T22:28:44.104050+02:00 localhost kernel: [ 46.952395][T13740] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
2023-06-05T22:28:52.052043+02:00 localhost kernel: [ 54.898702][T20243] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
2023-06-05T22:28:52.052044+02:00 localhost kernel: [ 54.898705][T20243] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2023-06-05T22:28:52.052044+02:00 localhost kernel: [ 54.898705][T20243] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:03:00.0)
2023-06-05T22:28:52.052045+02:00 localhost kernel: [ 54.899029][T20243] NVRM: The system BIOS may have misconfigured your GPU.
2023-06-05T22:28:52.052047+02:00 localhost kernel: [ 54.899042][T20243] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
2023-06-05T22:28:52.052047+02:00 localhost kernel: [ 54.899042][T20243] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
2023-06-05T22:28:52.052047+02:00 localhost kernel: [ 54.899044][T20243] NVRM: The system BIOS may have misconfigured your GPU.
2023-06-05T22:28:52.052048+02:00 localhost kernel: [ 54.899056][T20243] NVRM: The NVIDIA probe routine failed for 2 device(s).
2023-06-05T22:28:52.052048+02:00 localhost kernel: [ 54.899057][T20243] NVRM: None of the NVIDIA devices were initialized.
2023-06-05T22:28:52.052049+02:00 localhost kernel: [ 54.899180][T20243] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
This sounds like a similar issue like this
/tesla-k80-installation-issue/110336
My Bios sadly can’t modify my MMIOBase, so I installed uefi shell on an usbstick an ran some commands
Here is something from memmap:
Type Start End # Pages Attributes
MMIO 00000000F8000000-00000000FBFFFFFF 0000000000004000 8000000000000001
MMIO 00000000FEC00000-00000000FEC00FFF 0000000000000001 8000000000000001
MMIO 00000000FED00000-00000000FED03FFF 0000000000000004 8000000000000001
MMIO 00000000FED1C000-00000000FED1FFFF 0000000000000004 8000000000000001
MMIO 00000000FEE00000-00000000FEE00FFF 0000000000000001 8000000000000001
MMIO 00000000FF000000-00000000FFFFFFFF 0000000000001000 8000000000000001
I don’t know how but maybe it is possible to see if my MMIOBase is under 42 bit. I just think the start values are quite massive.
Chatgpt thought that dmp store is useful, but I could not get any information out of it.
dmpstore2.txt (127.2 KB)