Dell workstation one P2000 and the other is Telsa M40

In my Dell precision T7910, kernel 4.4 both cards work well.
In my precision T7920, kernel 4.15 Telsa M40 is not in nvidia-smi outputing list.

minglei@minglei-Precision-7920-Tower:~$ sudo lshw -c display
[sudo] password for minglei:
*-display
description: VGA compatible controller
product: GP106GL [Quadro P2000]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:03:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:42 memory:a2000000-a2ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:1000(size=128) memory:c0000-dffff
*-display UNCLAIMED
description: 3D controller
product: GM200GL [Tesla M40]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:b3:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress cap_list
configuration: latency=0
resources: memory:a7000000-a7ffffff

Precision-7920-Tower:~$ lspci -k | grep -A 2 -E “(VGA|3D)”
03:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
Subsystem: Dell Device 11b3
Kernel driver in use: nvidia

b3:00.0 3D controller: NVIDIA Corporation GM200GL [Tesla M40] (rev a1)
Subsystem: NVIDIA Corporation GM200GL [Tesla M40]
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Precision-7920-Tower:~$ dmesg |grep nvidia
[ 18.095594] nvidia: loading out-of-tree module taints kernel.
[ 18.095602] nvidia: module license ‘NVIDIA’ taints kernel.
[ 18.101256] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 18.108406] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[ 18.108753] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 18.108875] nvidia 0000:b3:00.0: enabling device (0100 → 0102)
[ 18.108941] nvidia: probe of 0000:b3:00.0 failed with error -1
[ 18.110475] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 384.130 Wed Mar 21 02:59:49 PDT 2018
[ 18.111278] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[ 18.111279] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[ 19.069915] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 239
[ 19.782670] caller os_map_kernel_space.part.2+0x6d/0x80 [nvidia] mapping multiple BARs
[ 21.141326] caller os_map_kernel_space.part.2+0x6d/0x80 [nvidia] mapping multiple BARs
[ 21.420157] nvidia-modeset: Allocated GPU:0 (GPU-b3740e00-0d89-fd9a-9a77-396e1bc17e7a) @ PCI:0000:03:00.0
[ 1814.278799] nvidia-modeset: Freed GPU:0 (GPU-b3740e00-0d89-fd9a-9a77-396e1bc17e7a) @ PCI:0000:03:00.0
[ 1984.399010] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 1984.446293] [drm] [nvidia-drm] [GPU ID 0x00000300] Unloading driver
[ 1984.487992] nvidia-modeset: Unloading
[ 1984.527949] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[ 2001.630546] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 2001.631132] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[ 2001.731089] nvidia: probe of 0000:b3:00.0 failed with error -1
[ 2001.744559] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 235
[ 2001.747988] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 418.40.04 Fri Mar 15 00:50:21 CDT 2019
[ 2001.751461] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[ 2001.751464] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[ 2001.753794] [drm] [nvidia-drm] [GPU ID 0x00000300] Unloading driver
[ 2001.792312] nvidia-modeset: Unloading
[ 2001.817120] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 2001.860391] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[ 2016.815377] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 2016.815879] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[ 2016.916337] nvidia: probe of 0000:b3:00.0 failed with error -1
[ 2016.921069] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 418.40.04 Fri Mar 15 00:50:21 CDT 2019
[ 2016.922813] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[ 2016.922817] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[ 2060.506297] caller os_map_kernel_space.part.6+0x6d/0x80 [nvidia] mapping multiple BARs

nvidia-bug-report.log.gz (1020 KB)

[   16.833153] pci 0000:b2:00.0: BAR 15: no space for [mem size 0xc00000000 64bit pref]
[   16.833154] pci 0000:b2:00.0: BAR 15: failed to assign [mem size 0xc00000000 64bit pref]
[   16.833156] pci 0000:b2:00.0: BAR 14: assigned [mem 0xa7000000-0xa7ffffff]
[   16.833159] pci 0000:b3:00.0: BAR 1: no space for [mem size 0x800000000 64bit pref]
[   16.833161] pci 0000:b3:00.0: BAR 1: failed to assign [mem size 0x800000000 64bit pref]
[   16.833163] pci 0000:b3:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[   16.833164] pci 0000:b3:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]

it’s a bug in newer kernels. Please check if ‘above 4G decoding’ is enabled in bios if available, otherwise, you can only try downgrading the kernel to e.g. 4.14.

I enabled the about 4G setting and degrade the kernel version to 4.4 as the same as my another workstation, the problem still exists.

I enabled the about 4G setting and degrade the kernel version to 4.4 as the same as my another workstation, the problem still exists.
nvidia-bug-report.log.gz (133 KB)