Stalled boot of nouveau-disabled Fedora 24

I am trying to boot into a nouveau-disabled configuration of Fedora 24, so I can run the 375.20 install script, but my system hangs before offering a login prompt. It boots and operates fine with the Radeon card installed (which was present when Fedora 24 was clean installed), running 4.8.11-200 and 4.8.10-200 kernels with Nouveau enabled and disabled in each (all four possibilities). I am WAY past the point of doing another clean install, so I have to figure this out without that option on the table.

I compared a successful boot log (with Radeon card installed) with a boot log where the system never offers a login prompt (nouveau disabled with 1070 card installed). The successful boot includes audit entries for USER_AUTH, USER_ACCT, CRED_ACQ just before it runs /usr/bin/login - the nouveau disabled boot with a 1070 card installed stalls right before that, after completing NetworkManager startup and adjusting the system clock (which occurs in both cases).

Any suggestions on what might be going on, what I can look into, … ?

Did you disable the graphical login manager? If it’s trying to start X and failing, that might explain why the boot seems to just freeze.

I don’t know about Fedora specifically, but most distributions will boot to a mode you can install the driver from if you pass “single” or “3” on the kernel command line from the boot loader.

it’s booting to run level 3 (multi-user); it’s not trying to startx (which would be run level 5)

I guess you know that in Fedora 24 you have to set the proper symbolic link to run into the desired runlevel.
Level 3 should run fine (in my experience), so there is probably something else going on, anyway a couple of hints:

  1. Ctrl+Alt+F2 (this is what I do without the need to change runlevel)
  2. ssh from the outside

if it boots into multi-user.target without a hitch with a Radeon card installed, can you not conclude that the systemd links are setup properly?

$ systemctl get-default shows…
multi-user.target

that’s old school run-level 3

I would think that using systemctl to set the target is safer than manually setting /etc/systemd/system links.

display-manager.service -> /usr/lib/systemd/system/gdm.service (in case you’re going to ask)

but… then I revert to the argument that this stuff is not being changed between boots and it boots up fine with a Radeon card installed, so I doubt that’s the problem.

Why am I not seeing USER_AUTH, USER_ACCT, CRED_ACQ, /usr/bin/login entries in the no-login boot.log with a 1070 card installed? Is there something about the 4.8.10-200 | 4.8.11-200 kernel with nouveau blacklisted?

I did not conclude anything, I said “I guess you know”
Anyway this was not the relevant part. As I said, runlevel 3 should work fine even without nouveau, so something is not set up properly.

  • Do you have any display output during boot?
  • It may be necessary to disable nouveau in the kernel command line, or verify that nouveau is disabled properly. Check section 8.1 of the readme
    I don’t know if the following may work, anyway:
  • Can you Ctrl+Alt+F2?
  • Can you access via ssh from an external box?

I can’t say, but the problem is most probably not with the login procedure, so maybe something before that in the log file may give some hint. There should be something about display card set up.

Yet another hint:
Does it work with nouveau? (i.e. not blacklisting nouveau)

You can’t run the install script with nouveau active, so that’s not an option.

Yes, you do get display output in the no-login boot (a couple of inconsequential error messages that you get in both cases).

If by you mean the configuration, I have …

with nouveau

linux16 /vmlinuz-4.8.12-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8

linux16 /vmlinuz-4.8.11-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8

linux16 /vmlinuz-4.8.10-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet

without nouveau

linux16 /vmlinuz-4.8.12-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rd.driver.blacklist=nouveau rhgb quiet LANG=en_US.UTF-8

linux16 /vmlinuz-4.8.11-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rd.driver.blacklist=nouveau rhgb quiet LANG=en_US.UTF-8

linux16 /vmlinuz-4.8.10-200.fc24.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rd.driver.blacklist=nouveau rhgb quiet

I have not tried SSH, largely because I am not trying to bring up X, the display shows text output with either graphics card installed, nouveau in or out (with Radeon), and login (which is not running at boot time with a 1070 card installed) will still be required there too.

both boots finish setting up the NetworkManager and adjusting the system clock - the very next things are USER_AUTH, USER_ACCT, CRED_ACQ, and /usr/bin/login in a Radeon (but not a 1070) boot.

I know this, I pointed this out so you would get to a fully functional system with the nvidia card and nouveau, then disable it according to section 8.1 of the readme, reboot and install the nvidia driver (I have been doing it since fedora core up to 24)

This looks like the same I have, obviously you also need to blacklist in /etc/modprobe.d

SSH has nothing to do with X, it allows you to get a (remote) shell even if your console is stuck (I use it when I upgrade and I am too lazy to switch runlevel)

Network manager and the clock are the last things, but possibly long before that there may be (and probably is) indication of how the display card is set up.

I have blacklisted nouveau in /etc/modprobe.d - it’s one of the suggested steps in https://www.if-not-true-then-false.com/2015/fedora-nvidia-guide/

I’ll look into SSH.

Is there something I that I can look for in the journal to find graphics card setup (it’s a HUGE file)?

I would refer to:
http://us.download.nvidia.com/XFree86/Linux-x86_64/375.20/README/index.html

Here’s my config (with kernel 4.8.7)
kernel cmd line: “nomodeset rhgb quiet rd.driver.blacklist=nouveau”
in /etc/modprobe.d/blacklist-nouveau.conf:
blacklist nouveau
options nouveau modeset=0

You may look at /var/log/messages, search for the timestamp of the boot time, in my case it starts with:
kernel: Linux version 4.8.7-200 …
Then my log is about the nvidia driver, anyway probably search for vgaarb

Also, I would not neglect to have the system running with nouveau first

The vgaarb entries are the same in both cases (Radeon card -> login prompt; and 1070 w/o nouveau -> no login prompt offered).

Nouveau runs fine with a Radeon card installed; I am trying to migrate away from the Radeon card to the new 1070 graphics card.

Well, now I’m somewhat worse off…

I did get in over SSH with the 1070 card installed, nouveau blacklisted, and the display hung without a login prompt. I ran the 375.20 installer over SSH successfully and took the option to register kernel module sources with DKMS, install 32-bit compatibility libraries, and generate an nVidia friendly xorg.conf (the install script supposedly backing up the existing xorg.conf somewhere). The install script went for GLVND GLX and GLVND EGL client libraries and skipped the rest.

I couldn’t find a backed up copy of xorg.conf (yeah, I know, I should have backed it up myself and not trusted the install script to do it for me.), so I renamed the xorg.conf created by the install script to xorg.conf_nvidia, swapped back to the Radeon card, generated a baseline xorg.conf and renamed it to xorg.conf_radeon. That way, I can create a symbolic link to whichever xorg.conf_ I want to use based on which card is installed.

I still can’t get a login prompt with the 1070 card installed and xorg.conf redirected to xorg.conf_nvidia, regardless of whether nouveau is enabled or blacklisted.

When I swap the link back to the radeon X conf, shutdown, swap the Radeon card back in, and boot with nouveau enabled, I still get a login prompt, but I can no longer successfully startx - I get a mostly white screen with some blue and red static, then all black, then back to the prompt with the second to last message saying “xinit: connection to X server lost” but nothing else out of the ordinary.

If the install script really backed up my original xorg.conf, where is it?

I’d like to be able to recover the ability to startx with the Radeon card installed and I’m still facing no-login on the workstation monitor with the 1070 card installed and xorg.conf pointing at the nVidia X conf.

Suggestions?

(back after a while, I had missed the last posts)

What I recommended was to have the nvidia card+nouveau work first.

This might be due to the nvidia installation that has replaced the opengl libraries.
I suggest you get in with ssh again, and uninstall the nvidia driver, the installer has a --uninstall option (run it with -A to see a list of all options). Hopefully this will roll back its changes.

Yeah, it is what else happens around them that might possibly give some hint.

Uninstall does not help and http://www.fedoraforum.org/forum/showthread.php?p=1767535 suggests the reason.

$ for i in rpm -qa mesa-\* xorg-x11-\*; do echo “$i”; rpm -V $i; echo “--------”; done

shows which files have been overwritten, but dnf will no longer reinstall the suite of damaged packages due to conflicts (a number of dnf updates since running the nvidia install script), even with erasing allowed.

Obviously, I have no choice but to continue going forward with the 1070 card. Please suggest to the team responsible for the install script that accepted engineering practice dictates not overwriting files sourced from distro packages.

Now that it’s obvious that I cannot return to a working Xwindows with the Radeon card installed without reinstalling the distribution from scratch, I’m now focused solely on getting the 1070 card running X. Since it has been some time since I began this process I checked to see if there was a newer driver available, discovering that the 375.20 page does not mention the GTX1070 (at least not now), so I downloaded 367.27 (which does). After uninstalling 375.20 and running the 367.27 install script, it indicates that I don’t need to build a DKMS module, so re-running without that, I get the following…

ERROR: An error occurred while performing the step: “Building kernel modules”. See /var/log/nvidia-installer.log for details. After

CONFTEST: pci_dma_mapping_error

and many many lines like this:
/tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm_linux.h:566:13: error: redefinition of ‘radix_tree_empty’

the tail end of the installer.log looks like:

In file included from /usr/src/kernels/4.8.16-200.fc24.x86_64/include/linux/fs.h:14:0,
from /usr/src/kernels/4.8.16-200.fc24.x86_64/include/linux/poll.h:9,
from /tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/common/inc/nv-linux.h:92,
from /tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm_linux.h:39,
from /tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm_common.h:62,
from /tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm8_fault_buffer_flush_test.c:24:
/usr/src/kernels/4.8.16-200.fc24.x86_64/include/linux/radix-tree.h:127:20: note: previous definition of ‘radix_tree_empty’ was here
static inline bool radix_tree_empty(struct radix_tree_root *root)
^~~~~~~~~~~~~~~~
/usr/src/kernels/4.8.16-200.fc24.x86_64/scripts/Makefile.build:289: recipe for target ‘/tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm8_fault_buffer_flush_test.o’ failed
make[3]: *** [/tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel/nvidia-uvm/uvm8_fault_buffer_flush_test.o] Error 1
make[3]: Target ‘__build’ not remade because of errors.
/usr/src/kernels/4.8.16-200.fc24.x86_64/Makefile:1477: recipe for target ‘module/tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel’ failed
make[2]: *** [module/tmp/selfgz7460/NVIDIA-Linux-x86_64-367.27/kernel] Error 2
make[2]: Target ‘modules’ not remade because of errors.
make[2]: Leaving directory ‘/usr/src/kernels/4.8.16-200.fc24.x86_64’
Makefile:150: recipe for target ‘sub-make’ failed
make[1]: *** [sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/kernels/4.8.16-200.fc24.x86_64’
Makefile:81: recipe for target ‘modules’ failed
make: *** [modules] Error 2
ERROR: The nvidia kernel module was not created.

Any suggestions?

Naturally, I still have no login prompt on the primary displays, so all of the foregoing has been done via SSH. I could use suggestions on restoring that capability with the 1070 board installed (I can still get a login prompt with the radeon board installed, though startx fails due to the corruption of various X packages). For reference, here’s the result of that one line script above…

mesa-libGLES-devel-12.0.3-2.fc24.x86_64
…L… /usr/lib64/libGLESv2.so

xorg-x11-apps-7.7-15.fc24.x86_64

xorg-x11-utils-7.5-21.fc24.x86_64

xorg-x11-server-Xorg-1.18.4-5.fc24.x86_64
…L… /usr/lib64/xorg/modules/extensions/libglx.so
missing /usr/lib64/xorg/modules/libglamoregl.so

mesa-libEGL-11.2.1-1.20160501.fc24.x86_64
SM5… /usr/lib64/libEGL.so.1
missing /usr/lib64/libEGL.so.1.0.0

xorg-x11-drv-openchrome-0.5.0-1.fc24.x86_64

xorg-x11-drv-synaptics-1.8.3-2.fc24.x86_64

xorg-x11-drv-intel-2.99.917-24.20160712.fc24.x86_64

mesa-libGL-12.0.3-2.fc24.x86_64
…L… /usr/lib64/libGL.so.1
missing /usr/lib64/libGL.so.1.2.0

xorg-x11-drv-wacom-0.32.0-2.fc24.x86_64

xorg-x11-xinit-1.3.4-11.fc24.x86_64

xorg-x11-server-Xwayland-1.18.4-5.fc24.x86_64

mesa-libGLU-devel-9.0.0-10.fc24.x86_64

mesa-libEGL-12.0.3-2.fc24.x86_64
SM5… /usr/lib64/libEGL.so.1
missing /usr/lib64/libEGL.so.1.0.0

mesa-libGLES-12.0.3-2.fc24.x86_64
SM5…T. /usr/lib64/libGLESv2.so.2
missing /usr/lib64/libGLESv2.so.2.0.0

xorg-x11-drv-fbdev-0.4.3-24.fc24.x86_64

mesa-libgbm-11.2.1-1.20160501.fc24.x86_64
S.5… /usr/lib64/libgbm.so.1.0.0

xorg-x11-fonts-misc-7.5-16.fc24.noarch

mesa-filesystem-11.2.1-1.20160501.fc24.x86_64

xorg-x11-drv-nouveau-1.0.12-4.fc24.x86_64

xorg-x11-xbitmaps-1.1.1-9.fc24.noarch

xorg-x11-fonts-ISO8859-1-100dpi-7.5-16.fc24.noarch

mesa-libGLU-9.0.0-10.fc24.x86_64

xorg-x11-xauth-1.0.9-5.fc24.x86_64

xorg-x11-drv-vmware-13.0.2-11.20150211git8f0cf7c.fc24.x86_64

xorg-x11-drv-evdev-2.10.3-1.fc24.x86_64

mesa-libglapi-12.0.3-2.fc24.x86_64

mesa-libGL-devel-12.0.3-2.fc24.x86_64
…L… /usr/lib64/libGL.so

xorg-x11-docs-1.7.1-3.fc24.noarch

xorg-x11-drv-vesa-2.3.2-24.fc24.x86_64

xorg-x11-server-Xwayland-1.18.3-2.fc24.x86_64
S.5… /usr/bin/Xwayland

mesa-libxatracker-12.0.3-2.fc24.x86_64

mesa-dri-drivers-12.0.3-2.fc24.x86_64

xorg-x11-proto-devel-7.7-19.fc24.noarch

xorg-x11-fonts-Type1-7.5-16.fc24.noarch

xorg-x11-drv-libinput-0.19.0-2.fc24.x86_64

xorg-x11-xtrans-devel-1.3.5-3.fc24.noarch

xorg-x11-drv-vmmouse-13.1.0-3.fc24.x86_64

xorg-x11-server-common-1.18.3-2.fc24.x86_64
…5… d /usr/share/man/man1/Xserver.1.gz

xorg-x11-server-utils-7.7-19.fc24.x86_64

mesa-filesystem-12.0.3-2.fc24.x86_64

xorg-x11-drv-qxl-0.1.4-7.fc24.x86_64

xorg-x11-xkb-utils-7.7-17.fc24.x86_64

xorg-x11-server-common-1.18.4-5.fc24.x86_64

mesa-libwayland-egl-12.0.3-2.fc24.x86_64

mesa-libEGL-devel-12.0.3-2.fc24.x86_64
…L… /usr/lib64/libEGL.so

xorg-x11-fonts-ISO8859-1-75dpi-7.5-16.fc24.noarch

mesa-libwayland-egl-11.2.1-1.20160501.fc24.x86_64
…5… /usr/lib64/libwayland-egl.so.1.0.0

xorg-x11-font-utils-7.5-31.fc24.x86_64

mesa-libgbm-12.0.3-2.fc24.x86_64

Hopefully that will be useful.

I am not sure it is about conflicts. It might be that packages in the ‘updates’ repository have been overwritten by more recent updates.
The easiest I can imagine is:
dnf update
dnf reinstall rpm -qa mesa-\*

Before running such commands, it is advisable you read carefully the manual for each of these commands.

I cannot speak for the team, but I can imagine they have to replace the (links to) the OpenGL libraries in order to use the accelerated routines.

I see that 375.26 lists the GTX 1070
But I still recommend that you get the system running with the 1070 card+nouveau and no nvidia driver running first.
eventually, you may also run for this:
dnf reinstall xorg-x11-drv-nouveau

ok, I am back to getting no errors from:

$ for i in rpm -qa mesa-\* xorg-x11-\*; do echo “$i”; rpm -V $i; echo “--------”; done

I’m now running the 4.9.5-100.fc24.x86_64 kernel. The default is to boot with nouveau enabled, but I have another grub2 entry that blacklists nouveau.

I downloaded 375.26 but have not installed it.

I never boot into graphics mode (preferring to startx manually after login), so my boot runlevel is 3.

Inconveniently, I am stuck in the very same situation as before: I do not get a login prompt on the main display (all of this was done over SSH) with the 1070 card installed. I have not attempted a switch back to the radeon card to verify that the system still offers a login prompt on the main display after reinstalling the X/mesa distro packages.

The boot log has almost 2300 lines in it and no obvious errors, so I really need suggestions there.

I know I already said this multiple times before, so this is going to be the last one.
I think before you install 375.26 (which by the way works perfectly on my system) you should enable nouveau and verify that you get a fully functional system (both console and X-windows, no wayland - I never tried it) first with nouveau. Nouveau works, and it is out-of-the-box integrated with linux, so no installation problems there. Only its performance (speed) is slower than the proprietary driver.

If this does not work, then you know that you have other problems going on, which you have to solve, and installing the nvidia driver will only make your life more difficult.