Centos 8.2 GDM crashes after cuda 11 install

I have a fresh install of centos 8.2, updated and enabled epel and power tools. Then I installed the latest cuda version from the official site. I have a 1660 graphics card.

cuda 11 install

When I restart after I install cuda GDM crashes with the “oops a problem has occurred screen”. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change. Can someone please help me get GDM to not crash so I can login to my workstation normally?

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).

Here is the bug report.nvidia-bug-report.log.gz.log (1.1 MB)

Please try disabling the intel igpu in bios. Alternatively, create /etc/X11/xorg.conf which only contains

Section "Device"
  Identifier "nvidia"
  Driver "nvidia"
  BusID "PCI:1:0:0"
  Option "AllowEmptyInitialConfiguration" "true
EndSection

nvidia-bug-report.log.gz.log (1.1 MB)

I disabled the igpu in the bios with no change in behavior. Here is another log after I have disabled the igpu. This worked on this same box with Centos 7 without any problem.

Everythings seems to work just right, the Xserver starting correctly. Maybe the client libs are broken, please post the output of
ls -l /lib64/libGL*

libgl.txt (2.2 KB)

Here is the output. I think everything is mostly working. Like I said, if I change to a tty and then manually startx I can get to the desktop without a problem. It seems to be a problem with gdm. I notice when the driver gets installed there are some packages with “egl-wayland” int he name that get installed.

Here is an xorg log that might help.Xorg.1.log (10.3 KB)

Specifically the line below may mean something

NVIDIA(GPU-0): Failed to acquire modesetting permission.

That’s helpful, didn’t notice it in the logs before. Please remove “rhgb” from kernel parameters. If that doesn’t help, please post the output of

ls -l /dev/nvid* /dev/dri/*

removing the rhgb parameter didn’t help the GDM login.

Here is the output from /proc/cmdline

BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-193.6.3.el8_2.x86_64 root=/dev/mapper/cl_tc--server-root ro crashkernel=auto resume=/dev/mapper/cl_tc--server-swap rd.lvm.lv=cl_tc-server/root rd.lvm.lv=cl_tc-server/swap quiet rd.driver.blacklist=nouveau

Here is the other output you asked for after removing the rhgb parameter

ls -l /dev/nvid* /dev/dri/*

crw-rw----+ 1 root video  226,   0 Jun 24 06:55 /dev/dri/card0
crw-rw-rw-. 1 root render 226, 128 Jun 24 06:55 /dev/dri/renderD128
crw-rw-rw-. 1 root root   195,   0 Jun 24 06:55 /dev/nvidia0
crw-rw-rw-. 1 root root   195, 255 Jun 24 06:55 /dev/nvidiactl
crw-rw-rw-. 1 root root   195, 254 Jun 24 06:55 /dev/nvidia-modeset

/dev/dri/by-path:
total 0
lrwxrwxrwx. 1 root root  8 Jun 24 06:55 pci-0000:01:00.0-card -> ../card0
lrwxrwxrwx. 1 root root 13 Jun 24 06:55 pci-0000:01:00.0-render -> ../renderD128

Just some more information about what exactly is happneing.

When I boot up I get this screen:

If I immediately switch to a tty It doesn’t work and I get this message:

If I press enter and “Log out” I get a blank screen and then back to the “Oh No!” screen. Now when I switch to a tty it works as expected.

Please let me know if there is anything else that will be helpful in diagnosing this issue.

Please post the output of
groups gdm

groups gdm
gdm : gdm

Please try adding it to the video group
sudo usermod -a -G video gdm

No change. I still get the “Oh no!” screen

groups gdm
gdm : gdm video

Not really an idea what could be wrong, maybe the driver just gets loaded too late, please run
sudo dracut -f
to recreate the initrd.

That didn’t change anything. I boot up in multi-user mode and then run

sudo systemctl start gdm

This has the same problem that booting into the graphical mode does. I get the “Oh no” screen.

Although “startx” gets me right to the desktop and everything appears to work.

I will try and open a bug with Centos and see if someone there has a solution.

I opened a bug on RHEL 8.2

https://bugzilla.redhat.com/show_bug.cgi?id=1851448

It looks like this is a bug related to selinux. It is unclear yet if this is a problem in RHEL or if the nvidia packages has bug.