Tesla K80 Installation Issue

Hi!

I installed a Tesla K80 on a AsRock Z390 Extreme 4 mainboard with the latest BIOS available. There is also a GeForce 1080ti in the system as the main video card.

Problem;

(base) me@me1:~$ dmesg | grep NVRM
[ 1.488315] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
[ 1.488315] NVRM: The system BIOS may have misconfigured your GPU.
[ 1.488327] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:05:00.0)
[ 1.488327] NVRM: The system BIOS may have misconfigured your GPU.
[ 1.488333] NVRM: The NVIDIA probe routine failed for 2 device(s).
[ 1.488333] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019

Already tried;

  1. Enabled “Above 4G support” in BIOS
  2. Tried acpi=off kernel option
  3. Tried pci=nocrs,noearly kernel option (also in combination with acpi=off)

The message stays the same.

The GeForce is seen, but not the Tesla:

$ nvidia-smi -L
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-4ba37c41-4de6-5e9a-2826-9dda420c2a4d)

Other info:

$ lspci -vvv |grep -i -A 20 nvidia

04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 1: Memory at 1400000000 (64-bit, prefetchable)
Region 3: Memory at 1200000000 (64-bit, prefetchable)
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

05:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 1: Memory at 1800000000 (64-bit, prefetchable)
Region 3: Memory at 1c00000000 (64-bit, prefetchable)
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #16, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Please help

Hi, I think you should review this. It’s not your specific board but the principals apply.

You need to find the PCIe MMIOHBase option in your BIOS and ensure it’s between 32 and 42 bits in size. MMIOHBase may be a hidden option that needs a BIOS mod to make visible.

Hello @roelmcbs2 ! Did you solve the reported issue ? I want to install the K80 on a similar MB (Extreme 4, X99) but I am afraid of any incompatibility. Please advise ! Thanks a lot !