I have a fresh install of centos 8.2, updated and enabled epel and power tools. Then I installed the latest cuda version from the official site. I have a 1660 graphics card.
cuda 11 install
When I restart after I install cuda GDM crashes with the “oops a problem has occurred screen”. If I boot up in multiuser mode everything works as expected. I can even run “startx” and get to the desktop just fine. I uncommented the line “WaylandEnable=false” in /etc/gdm/custom.conf with no change. Can someone please help me get GDM to not crash so I can login to my workstation normally?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post. You will have to rename the file ending to something else since the forum software doesn’t accept .gz files (nifty!).
Here is the bug report.nvidia-bug-report.log.gz.log (1.1 MB)
Please try disabling the intel igpu in bios. Alternatively, create /etc/X11/xorg.conf which only contains
Option "AllowEmptyInitialConfiguration" "true
nvidia-bug-report.log.gz.log (1.1 MB)
I disabled the igpu in the bios with no change in behavior. Here is another log after I have disabled the igpu. This worked on this same box with Centos 7 without any problem.
Everythings seems to work just right, the Xserver starting correctly. Maybe the client libs are broken, please post the output of
ls -l /lib64/libGL*
libgl.txt (2.2 KB)
Here is the output. I think everything is mostly working. Like I said, if I change to a tty and then manually startx I can get to the desktop without a problem. It seems to be a problem with gdm. I notice when the driver gets installed there are some packages with “egl-wayland” int he name that get installed.
Here is an xorg log that might help.Xorg.1.log (10.3 KB)
Specifically the line below may mean something
NVIDIA(GPU-0): Failed to acquire modesetting permission.
That’s helpful, didn’t notice it in the logs before. Please remove “rhgb” from kernel parameters. If that doesn’t help, please post the output of
ls -l /dev/nvid* /dev/dri/*
removing the rhgb parameter didn’t help the GDM login.
Here is the output from /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-193.6.3.el8_2.x86_64 root=/dev/mapper/cl_tc--server-root ro crashkernel=auto resume=/dev/mapper/cl_tc--server-swap rd.lvm.lv=cl_tc-server/root rd.lvm.lv=cl_tc-server/swap quiet rd.driver.blacklist=nouveau
Here is the other output you asked for after removing the rhgb parameter
ls -l /dev/nvid* /dev/dri/*
crw-rw----+ 1 root video 226, 0 Jun 24 06:55 /dev/dri/card0
crw-rw-rw-. 1 root render 226, 128 Jun 24 06:55 /dev/dri/renderD128
crw-rw-rw-. 1 root root 195, 0 Jun 24 06:55 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Jun 24 06:55 /dev/nvidiactl
crw-rw-rw-. 1 root root 195, 254 Jun 24 06:55 /dev/nvidia-modeset
lrwxrwxrwx. 1 root root 8 Jun 24 06:55 pci-0000:01:00.0-card -> ../card0
lrwxrwxrwx. 1 root root 13 Jun 24 06:55 pci-0000:01:00.0-render -> ../renderD128
Just some more information about what exactly is happneing.
When I boot up I get this screen:
If I immediately switch to a tty It doesn’t work and I get this message:
If I press enter and “Log out” I get a blank screen and then back to the “Oh No!” screen. Now when I switch to a tty it works as expected.
Please let me know if there is anything else that will be helpful in diagnosing this issue.
Please post the output of
Please try adding it to the video group
sudo usermod -a -G video gdm
No change. I still get the “Oh no!” screen
gdm : gdm video
Not really an idea what could be wrong, maybe the driver just gets loaded too late, please run
sudo dracut -f
to recreate the initrd.
That didn’t change anything. I boot up in multi-user mode and then run
sudo systemctl start gdm
This has the same problem that booting into the graphical mode does. I get the “Oh no” screen.
Although “startx” gets me right to the desktop and everything appears to work.
I will try and open a bug with Centos and see if someone there has a solution.
I opened a bug on RHEL 8.2
It looks like this is a bug related to selinux. It is unclear yet if this is a problem in RHEL or if the nvidia packages has bug.