Centos 8.2 GDM crashes after cuda 11 install

I have a fresh install of centos 8.2, updated and enabled epel and power tools. Then I installed the latest cuda version from the official site. I have a 1660 graphics card.

cuda 11 install

When I restart after I install cuda GDM crashes with the “oops a problem has occurred screen”. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change. Can someone please help me get GDM to not crash so I can login to my workstation normally?

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

Here is the bug report.nvidia-bug-report.log.gz.log (1.1 MB)

Please try disabling the intel igpu in bios. Alternatively, create /etc/X11/xorg.conf which only contains

Section "Device"
  Identifier "nvidia"
  Driver "nvidia"
  BusID "PCI:1:0:0"
  Option "AllowEmptyInitialConfiguration" "true
EndSection

nvidia-bug-report.log.gz.log (1.1 MB)

I disabled the igpu in the bios with no change in behavior. Here is another log after I have disabled the igpu. This worked on this same box with Centos 7 without any problem.

Everythings seems to work just right, the Xserver starting correctly. Maybe the client libs are broken, please post the output of
ls -l /lib64/libGL*

libgl.txt (2.2 KB)

Here is the output. I think everything is mostly working. Like I said, if I change to a tty and then manually startx I can get to the desktop without a problem. It seems to be a problem with gdm. I notice when the driver gets installed there are some packages with “egl-wayland” int he name that get installed.

Here is an xorg log that might help.Xorg.1.log (10.3 KB)

Specifically the line below may mean something

NVIDIA(GPU-0): Failed to acquire modesetting permission.

That’s helpful, didn’t notice it in the logs before. Please remove “rhgb” from kernel parameters. If that doesn’t help, please post the output of

ls -l /dev/nvid* /dev/dri/*

removing the rhgb parameter didn’t help the GDM login.

Here is the output from /proc/cmdline

BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-193.6.3.el8_2.x86_64 root=/dev/mapper/cl_tc--server-root ro crashkernel=auto resume=/dev/mapper/cl_tc--server-swap rd.lvm.lv=cl_tc-server/root rd.lvm.lv=cl_tc-server/swap quiet rd.driver.blacklist=nouveau

Here is the other output you asked for after removing the rhgb parameter

ls -l /dev/nvid* /dev/dri/*

crw-rw----+ 1 root video  226,   0 Jun 24 06:55 /dev/dri/card0
crw-rw-rw-. 1 root render 226, 128 Jun 24 06:55 /dev/dri/renderD128
crw-rw-rw-. 1 root root   195,   0 Jun 24 06:55 /dev/nvidia0
crw-rw-rw-. 1 root root   195, 255 Jun 24 06:55 /dev/nvidiactl
crw-rw-rw-. 1 root root   195, 254 Jun 24 06:55 /dev/nvidia-modeset

/dev/dri/by-path:
total 0
lrwxrwxrwx. 1 root root  8 Jun 24 06:55 pci-0000:01:00.0-card -> ../card0
lrwxrwxrwx. 1 root root 13 Jun 24 06:55 pci-0000:01:00.0-render -> ../renderD128

Just some more information about what exactly is happneing.

When I boot up I get this screen:

If I immediately switch to a tty It doesn’t work and I get this message:

If I press enter and “Log out” I get a blank screen and then back to the “Oh No!” screen. Now when I switch to a tty it works as expected.

Please let me know if there is anything else that will be helpful in diagnosing this issue.

1 Like

Please post the output of
groups gdm

groups gdm
gdm : gdm

Please try adding it to the video group
sudo usermod -a -G video gdm

No change. I still get the “Oh no!” screen

groups gdm
gdm : gdm video

Not really an idea what could be wrong, maybe the driver just gets loaded too late, please run
sudo dracut -f
to recreate the initrd.

That didn’t change anything. I boot up in multi-user mode and then run

sudo systemctl start gdm

This has the same problem that booting into the graphical mode does. I get the “Oh no” screen.

Although “startx” gets me right to the desktop and everything appears to work.

I will try and open a bug with Centos and see if someone there has a solution.

I opened a bug on RHEL 8.2

https://bugzilla.redhat.com/show_bug.cgi?id=1851448

It looks like this is a bug related to selinux. It is unclear yet if this is a problem in RHEL or if the nvidia packages has bug.