Ubuntu , NVIDIA 390 Drivers, and K80 Issues

Hi All,

I have been having issues installing NVIDIA drivers onto my Ubuntu box. The machine uses the onboard GPU (intel) for visualization with a hope to use the K80s for computing.

I have tried installing though multiple different paths including GUI and a few different options on CMD.

Anyhow, I am not very adept at linux and would appreciate any help you all can give.

When I try “nvidia-smi” --> “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

“dkms status”–>
nvidia, 390.138, 5.4.0-53-generic, x86_64: installed

“lspci -v | grep 3D”–>
03:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
04:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
07:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
08:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

“nvidia-settings”–>
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system

Here is the debug report:
nvidia-bug-report.log.gz (2.0 MB)

I appreciate any help. Also being that I am a linux-moron, please use baby steps.

-Blake

It looks like there’s a system-level configuration problem that is preventing the GPUs from being initialized:

Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282265] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282265] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:03:00.0)
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282266] NVRM: The system BIOS may have misconfigured your GPU.
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282275] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282275] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:04:00.0)
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282276] NVRM: The system BIOS may have misconfigured your GPU.
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282283] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282283] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:07:00.0)
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282283] NVRM: The system BIOS may have misconfigured your GPU.
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282290] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282290] NVRM: BAR0 is 0M @ 0x0 (PCI:0000:08:00.0)
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282290] NVRM: The system BIOS may have misconfigured your GPU.
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282313] NVRM: The NVIDIA probe routine failed for 4 device(s).
Nov 15 17:05:48 blake-MS-7751 kernel: [ 2006.282313] NVRM: None of the NVIDIA graphics adapters were initialized!

One common cause of that kind of problem is if your system BIOS is configured not to allow PCIe device resources to be located above the 4 GB memory line. If there are enough GPUs in the system, it can run out of address space and just fail to assign mappings for these important memory resources.

Please check your system BIOS to see if there’s an option called something like “Above 4G decoding” and make sure it’s enabled.