Tesla K80 Installation Issue

Hi!

I installed a Tesla K80 on a AsRock Z390 Extreme 4 mainboard with the latest BIOS available. There is also a GeForce 1080ti in the system as the main video card.

Problem;

(base) me@me1:~$ dmesg | grep NVRM
[ 1.488315] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
[ 1.488315] NVRM: The system BIOS may have misconfigured your GPU.
[ 1.488327] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:05:00.0)
[ 1.488327] NVRM: The system BIOS may have misconfigured your GPU.
[ 1.488333] NVRM: The NVIDIA probe routine failed for 2 device(s).
[ 1.488333] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019

Already tried;

  1. Enabled “Above 4G support” in BIOS
  2. Tried acpi=off kernel option
  3. Tried pci=nocrs,noearly kernel option (also in combination with acpi=off)

The message stays the same.

The GeForce is seen, but not the Tesla:

$ nvidia-smi -L
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-4ba37c41-4de6-5e9a-2826-9dda420c2a4d)

Other info:

$ lspci -vvv |grep -i -A 20 nvidia

04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 1: Memory at 1400000000 (64-bit, prefetchable)
Region 3: Memory at 1200000000 (64-bit, prefetchable)
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

05:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
Subsystem: NVIDIA Corporation GK210GL [Tesla K80]
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 17
Region 1: Memory at 1800000000 (64-bit, prefetchable)
Region 3: Memory at 1c00000000 (64-bit, prefetchable)
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 25.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #16, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Please help

Hi, I think you should review this. It’s not your specific board but the principals apply.

https://www.supermicro.com/support/faqs/faq.cfm?faq=20016

You need to find the PCIe MMIOHBase option in your BIOS and ensure it’s between 32 and 42 bits in size. MMIOHBase may be a hidden option that needs a BIOS mod to make visible.

Hello @roelmcbs2 ! Did you solve the reported issue ? I want to install the K80 on a similar MB (Extreme 4, X99) but I am afraid of any incompatibility. Please advise ! Thanks a lot !