RHEL 7.9 / CentOS 7.9 /x64 UEFI / black screen on TTY with NVIDIA Quadro P400 drivers

Rejean-L · March 29, 2022, 10:04pm

RHEL 7.9 / CentOS 7.9 /x64 UEFI / black screen on TTY with NVIDIA Quadro P400 drivers version (nvidia-510.60.02 or 510.54)

Hi,

I have looked everywhere on the net. I have seen report of that issue for some Linux and tried & tested everything I could find… but nothing work.

Fresh install of CentOS 7.9 fully updated in GPT+UEFI + GRUB2 and gnome setup is working fine with Nouveau driver… Until I install Nvidia driver.

It will fails to show any boot activity and fail to show any text terminal… switching terminal from 2 to 6 (using ctrl+alt+f2 … to f6) does loose video input on the screen.

Only tty 1 with gnome is displaying the login screen.

is there a solution somewhere ?
Anyone know where to start about this ?

thank you !!

nvidia-bug-report.log.gz (260.5 KB)
nvidia-bug-report-20220330.log.gz (269.5 KB)

generix · March 30, 2022, 9:01am

You could start by checking dmesg for what’s happening with fbcon and efifb.
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

Rejean-L · March 30, 2022, 5:41pm

I have added the bug report.

I don’t see much about fbcon and efifb… here it is. not sure I understand it.

[ 0.805893] efifb: invalid framebuffer address
[ 0.805903] Device ‘efifb.0’ does not have a release() function, it is broken and must be fixed.
[ 0.805933] [] efifb_init+0x28f/0x2b0

Rejean-L · March 30, 2022, 5:44pm

thank you very much.

I have added 2 bug report… (they need to be accepted by an admin)

I dont see much about fbcon and efifb and don’t know what it means. here:

% dmesg | grep -i efifb
[ 0.805893] efifb: invalid framebuffer address
[ 0.805903] Device ‘efifb.0’ does not have a release() function, it is broken and must be fixed.
[ 0.805933] [] efifb_init+0x28f/0x2b0

generix · March 30, 2022, 6:21pm

Somehow, the efifb doesn’t get assigned to the nvidia card, i.e. the dmesg is missing the message

pci 0000:01:00.0: BAR 2: assigned to efifb

so the efifb driver then complains and crashes.
Though I don’t know where this comes from, might be a bug in either system bios or video bios or kernel.
Can you possibly boot some live distro with a current kernel and cross check the dmesg output? Also seems the cpu is not fully supported by the old RHEL7 kernel.
With nouveau this works because it implements a drmfb which replaces the standard efifb.

Rejean-L · March 31, 2022, 12:15am

Thank you. that is the start of understanding. Still unsure what to do with this.

The CPU is a Intel 11th gen and may be not fully supported by the old RHEL7 kernel… but I also had those kind of msg with 9th generation and it was working… and it is working with Nouveau driver so I guess the kernel can handle it anyway.

A live distro is unnecessary since it wont have the nvidia driver installed or uninstallable.

latest Centos 7.9 installed with latest nvidia driver on a uEFI + GPT installation…
I am pretty sure it was working with BIOS + CSM + MBR with the same CentOS and Nvidia.

I an unsure why does the efifb doesn’t get assigned to the nvidia card ?
How can I assign it ?

Rejean-L · March 31, 2022, 1:25am

I tough efifb was not able to read the screen resolution because I had a (mini DP to VGA) or (mini DP to HDMI) converter… so I change the cable and installed a DP to DP monitor… but the same issue occurs.

I also seen issue on RedHat forum adding a base resolution to grub.cfg … video=efifb:width:640,height:480 video=640x480

but it still the same and fails to show console on boot and fails to display on any TTY.

Rejean-L · March 31, 2022, 2:04am

I decided to test if the latest Linux kernel would support efifb better… using the kernel from elrepo…
yum --enablerepo=elrepo-kernel install kernel-ml

so it updated to CentOS kernel to 5.17.1 …

errors about CPU and efifb disappeared for sure… and I got a BAR 1…

[root@pcys05:~ ]% dmesg | grep efifb
[    0.264592] pci 0000:01:00.0: BAR 1: assigned to efifb

but the nvidia driver does not load… and I got no X11 at all…

generix · March 31, 2022, 12:50pm

How did you initially install the driver?

Rejean-L · March 31, 2022, 6:15pm

From elrepo EL79

yum --enablerepo=elrepo install -y kmod-nvidia

Installed Packages
Name        : kmod-nvidia
Arch        : x86_64
Version     : 510.60.02
Release     : 1.el7_9.elrepo
Size        : 132 M
Repo        : installed
From repo   : elrepo
Summary     : nvidia kernel module(s)
URL         : http://www.nvidia.com/
License     : Proprietary
Description : This package provides the nvidia kernel module(s) built
            : for the Linux kernel using the x86_64 family of processors.

generix · April 1, 2022, 9:49am

Please try (re)installing the headers kernel-ml-devel
Afterwards, please post the output of
dkms status
and
sudo yum list installed |grep nvidia

Thinking · April 1, 2022, 4:17pm

Hi, I just wanted to chime in and say that I am experiencing a similar issue:

My System:
OS: Fedora Silverblue 36 Beta
Platform: AMD B550
GPU: Nvidia 980ti
Driver: 510.60.02

I have a parallel installation of Arch Linux (Nvidia Driver 510.54-11) which has no issues with display output or TTYs. (Note: In case it matters, the Arch installation also uses Booster instead of Fedora’s Dracut for its initramfs, both have encrypted root partitions)

On my Silverblue install with the new 510.60.02 driver I am seeing the following symptoms:

TTYs don’t work, screen goes black and monitor says “no signal”, going back into GDM or Gnome works and video output resumes.
The boot screen shows text output just fine until a certain point after which it will only show “graphic” output such as the graphic disk encryption password entry screen, this can be seen by removing rhgb from the kernel arguments. The exact point at which it happens doesn’t appear to be consistent, but usually around the time when various nvidia related messages appear. Interestingly, the way this happens is that (if quiet isn’t in the kernel arguments) the boot message output will just freeze (not going into “no signal”), meanwhile input is still processed in the background, such as the disk password entry.
This one I managed to fix but I think it is probably related to the other issues: When booting with the Nvidia driver, it would either A) Go into “no signal” when GDM should appear or B) Work, but not run with Wayland until signing into a user and back out, at which point the Wayland option would be available. Whichever occurs seems somewhat random, though I have had the most luck “unbreaking” it by fully shutting down the PC and turning off power for a minute or so. The fix was to add add_drivers+=" nvidia nvidia_modeset nvidia_uvm nvidia_drm " to the Dracut configuration (and enable initramfs generation via rpm-ostree), after that Wayland is, so far, always available and no more “no signal” (except for the TTY bug which is still there).

Once again, none of this is a problem on my Arch install.

I tried to install the same version of the Nvidia driver which I have on my Arch install on Fedora SIlverblue but I ran into some weird issues and decided it’s not worth it for now, maybe I’ll attempt it again later.

Update:

I just installed 510.60.02 on Arch Linux and… it works perfectly! So the issue must lie somewhere else other than the driver version.

Update 2:

…and it seems to work in Fedora Silverblue 35. So this might not be a Nvidia issue after all. I will look into the right channels to report this to Red Hat/Fedora.

Rejean-L · April 1, 2022, 6:20pm

kernel-ml-devel was already correctly installed but I did reinstalled them … remove nvidia-driver and then reboot… then reinstall kmod-nvidia and rebooted again.

% dkms status
(nothing to show)

% yum list installed | grep nvidia
Loaded plugins: fastestmirror, langpacks, nvidia
kmod-nvidia.x86_64                     510.60.02-1.el7_9.elrepo        @elrepo  
nvidia-detect.x86_64                   510.47.03-1.el7.elrepo          @elrepo  
nvidia-x11-drv.x86_64                  510.60.02-1.el7_9.elrepo        @elrepo  
nvidia-x11-drv-libs.x86_64             510.60.02-1.el7_9.elrepo        @elrepo  
yum-plugin-nvidia.noarch               1.0.2-1.el7.elrepo              @elrepo

latest kernel installed. nvidia driver still fail to load gdm… so basically I only have text mode if nvidia is installed. (with nouveau gdm is ok).

I am not sure the latest kernel was a viable long term option anyways for many lab computer running our CAD tools… but I still wanna make sure it could work… it does not exactly but may show that Centos 7.9 kernel is having issues with CPU and efifb detection.

I included a (new) bug report.
nvidia-bug-report-kernel5171.log.gz (77.9 KB)

generix · April 1, 2022, 8:56pm

The problem is that elrepo only provides nvidia modules for the stock 3.10 kernel, not the ml kernel. You would have to switch to another driver repo, e.g. the nvidia cuda repo and use nvidia-latest-dkms

Rejean-L · June 16, 2022, 8:46pm

I had to revert to a fresh CentOS 7.9 install with the builtin updated kernel.
I again hope latest version would work… it’s even worst !!

… kmod-nvidia.x86_64 0:515.48.07-1.el7_9.elrepo
… nvidia-x11-drv.x86_64 0:515.48.07-1.el7_9.elrepo

installed with : yum --enablerepo=elrepo install -y kmod-nvidia
removed with: yum history undo 20

The fresh installed CentOS 7.9 system is unstable with the Nvidia driver installed and it hang when trying to switch from gnome to terminals… system no longer available and no more gnome to login.

[ 0.800000] efifb: invalid framebuffer address
[ 0.800007] Device ‘efifb.0’ does not have a release() function, it is broken and must be fixed.
[ 0.800025] [] efifb_init+0x28f/0x2b0

I don’t see what else can be done. It just does not work… kernel may be to old but I don’t understand why a driver is build for EL 7.9 but not working at all RHEL 7.9 or CentOS 7.9.

generix · June 17, 2022, 7:16am

The efifb is handled by the kernel only, tthe nvidia driver doesn’t have any influence on that. It’s a bug in 3.x kernels with some graphics cards with efi boot.

Rejean-L · June 17, 2022, 5:42pm

Did anyone reported this to Redhat to get a fix on that kernel ? Can it be fixed ?

“It’s a bug in 3.x kernels with some graphics cards with efi boot.”
so it does work with some other cards.

Rejean-L · June 20, 2022, 5:25pm

Did anyone reported this to Redhat to get a fix on that kernel ?
Can kernel bug be fixed thru the driver ?
So does it work with some other cards and that driver kmod-nvidia (515.48.07) ?

because here I have a lot of broken machine… even more broken since the latest driver update because at every logout the machine need to be rebooted and no more login screen appears (black screen).