Driver allocating memory over pci slot size

Hi,

I have 1050ti GPU and driver version 450.80.02 on RHEL8.3. The display does not come up at the point graphics should start. This happens repeatedly today, sometimes I have got by doing reboot with older version from grub.

Earlier it worked with the devel driver, but I had to downgrade due no support for nvidia devel drivers in flathub flatpakk packaged software. Even with those drivers I had to always switch the port of LG 4K monitor at each boot to get it recognized. With these drivers it worked at first, but now is broken. Displays are some Samsung FullHD (DVI) and LG 4K monitor (HDMI).

I see dmesg has these lines related to nvidia:

[  +0.000151] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input16
[  +0.000214] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input17
[  +0.000237] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input18
[  +0.000142] input: HDA NVidia HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input19
[  +0.000104] input: HDA NVidia HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input20
[  +0.000532] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[  +0.087752] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.80.02  Wed Sep 23 01:13:39 UTC 2020
[  +0.051973] nvidia-uvm: Loaded the UVM driver, major device number 239.
[  +0.030732] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  450.80.02  Wed Sep 23 00:48:09 UTC 2020
[  +0.006106] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[  +0.000003] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[  +0.648023] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000208] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[  +0.080295] usb 1-4: reset high-speed USB device number 4 using ehci-pci
[  +0.585993] virbr0: port 1(virbr0-nic) entered disabled state
[ +15.592257] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000032] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  +0.644779] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000233] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[Nov18 10:05] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000032] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  +0.591821] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000209] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[ +16.164278] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000033] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  +0.589168] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000244] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[ +16.165303] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000029] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  +0.590336] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000219] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[Nov18 10:06] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000032] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[  +0.587230] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000249] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[ +16.164630] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x65:1266)
[  +0.000030] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

And X11.log ends up with this:

[   101.396] (II) NVIDIA GLX Module  450.80.02  Wed Sep 23 00:51:32 UTC 2020
[   101.396] (II) NVIDIA: The X server supports PRIME Render Offload.
[   101.396] (WW) NVIDIA(0): Failed to initialize Base Mosaic!  Reason: Only one GPU
[   101.396] (WW) NVIDIA(0):     detected.  Only one GPU will be used for this X screen.
[   117.809] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0.  Please
[   117.809] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[   117.809] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[   117.809] (EE) NVIDIA(GPU-0):     README for additional information.
[   117.809] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[   117.809] (EE) NVIDIA(0): Failing initialization of X screen
[   117.809] (II) UnloadModule: "nvidia"
[   117.809] (II) UnloadSubModule: "glxserver_nvidia"
[   117.809] (II) Unloading glxserver_nvidia
[   117.809] (II) UnloadSubModule: "wfb"
[   117.809] (II) UnloadSubModule: "fb"
[   117.809] (EE) Screen(s) found, but none have a usable configuration.
[   117.809] (EE) 
Fatal server error:
[   117.809] (EE) no screens found(EE) 
[   117.809] (EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
[   117.809] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[   117.809] (EE) 
[   117.812] (EE) Server terminated with error (1). Closing log file.

Modinfo:

modinfo nvidia
filename:       /lib/modules/4.18.0-240.1.1.el8_3.x86_64/extra/drivers/video/nvidia/nvidia.ko
alias:          char-major-195-*
version:        450.80.02
supported:      external
license:        NVIDIA
rhelversion:    8.3
srcversion:     2132A76E28730AB295AF17B

Under Windows both monitors work fine. This is dual boot machine. And it has worked fine for long.

BR,

ikke

Now I rebooted to older kernel, and it works. This is the dmesg from that:

╰─➤  dmesg -H |grep -i -A2 -B2 nvidia                                                                                  2 ↵
[  +0.000071] input: HDA ATI SB Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card1/input22
[  +0.057355] usbcore: registered new interface driver snd-usb-audio
[  +0.152953] nvidia: loading out-of-tree module taints kernel.
[  +0.000011] nvidia: module license 'NVIDIA' taints kernel.
[  +0.000000] Disabling lock debugging due to kernel taint
[  +0.009578] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  +0.011320] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[  +0.013077] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[  +0.023867] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input23
[  +0.000096] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input24
[  +0.000074] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card2/input25
[  +0.059442] kvm: Nested Paging enabled
[  +0.004572] MCE: In-kernel MCE decoding enabled.
--
               Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
               (Note that use of the override may cause unknown side effects.)
[  +0.032696] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.80.02  Wed Sep 23 01:13:39 UTC 2020
[  +0.051806] nvidia-uvm: Loaded the UVM driver, major device number 238.
[  +0.030977] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  450.80.02  Wed Sep 23 00:48:09 UTC 2020
[  +0.007377] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[  +0.000002] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
[  +0.499297] XFS (dm-2): Mounting V5 Filesystem
[  +0.000964] XFS (sdd2): Mounting V5 Filesystem
--
[  +0.061189] virbr0: port 1(virbr0-nic) entered disabled state
[  +0.637155] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
[  +0.000229] caller _nv000745rm+0x1af/0x200 [nvidia] mapping multiple BARs
[  +0.531917] usb 1-4: reset high-speed USB device number 4 using ehci-pci
[  +0.093284] hrtimer: interrupt took 3008553 ns

modinfo:

╰─➤  modinfo nvidia
filename:       /lib/modules/4.18.0-193.28.1.el8_2.x86_64/extra/drivers/video/nvidia/nvidia.ko
alias:          char-major-195-*
version:        450.80.02
supported:      external
license:        NVIDIA
rhelversion:    8.2
srcversion:     2132A76E28730AB295AF17B
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        
name:           nvidia
vermagic:       4.18.0-193.28.1.el8_2.x86_64 SMP mod_unload modversions 
sig_id:         PKCS#7
signer:         NVIDIA