Motherboard is a GA-Z97X-GAMING G1 WIFI-BK from Gigabyte and I have 4 Tesla K40 installed side by side. All 4 are recognized with lspci | grep -i nvidia.
So far I have attempted to install via:
- Ubuntu Repository
- Ubuntu graphics-drivers PPA
- Driver .Deb
- Cuda Toolkit .Deb
The driver installs with all of these methods. Yet when I run nvidia-smi it gives me an error that the driver is not loaded.
I have read that installing the driver via the toolkit runfile fixes the issue for some. However, the driver installation fails via this method.
Here is an excerpt from nvidia-installer.log:
ERROR: Unable to load the kernel module ‘nvidia.ko’. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and preve$
Please see the log entries ‘Kernel module load error’ and ‘Kernel messages’ at the end of the file ‘/var/log/nvidia-installer.log’ for more information.
→ Kernel module load error: No such device
→ Kernel messages:
[ 97.527156] ipmi device interface
[ 97.531411] nvidia: loading out-of-tree module taints kernel.
[ 97.531416] nvidia: module license ‘NVIDIA’ taints kernel.
[ 97.531416] Disabling lock debugging due to kernel taint
[ 97.535064] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 97.539600] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 97.539882] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:03:00.0)
[ 97.539882] NVRM: The system BIOS may have misconfigured your GPU.
[ 97.539885] nvidia: probe of 0000:03:00.0 failed with error -1
[ 97.539919] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
[ 97.539919] NVRM: The system BIOS may have misconfigured your GPU.
[ 97.539921] nvidia: probe of 0000:04:00.0 failed with error -1
[ 97.539966] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:05:00.0)
[ 97.539966] NVRM: The system BIOS may have misconfigured your GPU.
[ 97.539968] nvidia: probe of 0000:05:00.0 failed with error -1
[ 97.540012] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[ 97.540013] NVRM: The system BIOS may have misconfigured your GPU.
[ 97.540015] nvidia: probe of 0000:06:00.0 failed with error -1
[ 97.540033] NVRM: The NVIDIA probe routine failed for 4 device(s).
[ 97.540034] NVRM: None of the NVIDIA devices were initialized.
[ 97.540154] nvidia-nvlink: Unregistered the Nvlink Core, major device number 238
Anyone have any idea on how I can get this to work? From what I can tell maybe 4 is using too many resources for this platform/motherboard, but I’d like to get some feedback here to see if maybe I can use all 4.