Nvidia drivers do not boot in fresh Ubuntu 20.04 LTS with eGPU that works well with default Nouveau driver

After installing NVIDIA graphics drivers, cannot boot into desktop environment, on a fresh install of Ubuntu 20.04 LTS, on an Intel NUC10FNH, with Akitio Node eGPU enclosure and Geforce GTX 1080 Ti graphics card.

Processor: Intel® Core™ i7-10710U CPU @ 1.10GHz × 12
Graphics: Mesa Intel® UHD Graphics (CML GT2) / NV132

I’ve tried nearly all driver versions listed by ubuntu-drivers devices from 418 to 455, and they all hang the boot on a black screen with flashing cursor after a disk check (although ssh works). I’ve tried every documented install method from the additional drivers GUI, to ubuntu-drivers autoinstall, to direct install of a specific version with ppa and apt. Using the default builtin nouveau driver works flawlessly on the eGPU monitor, however I need to install CUDA and use nvidia-smi.

Here are some debug logs I collected, after following this thread:
nvidia-bug-report-part1.txt (2.2 MB) nvidia-bug-report-part2.txt (2.0 MB)

As far as I can tell, there’s no special blacklisting of nvidia happening anywhere. There’s no nvidia-specific files in /etc/modprobe.d/. Running sudo prime-select query over ssh shows nvidia when boot is hanging. I originally had secure boot enabled when first attempting the graphics driver install through the GUI, but disabled it after the first failed restart.

Thanks!

1 Like

Here are some additional debug files collected through ssh (could not upload more than 3 files in OP, also could not upload files larger than 4.4MB, so I split the debug report above in two):
grep_nvidia_in_lib_udev_rules_d.txt (2.0 KB) lib_modprobe_d_nvidia_graphics-drivers_conf.txt (81 Bytes) lib_modprobe_d_nvidia_kms_conf.txt (109 Bytes)

lshw.txt (1.1 KB) nvidia-smi.txt (1.5 KB)

Can we get the output of journalctl?

It’s fairly noisy, so instead of splitting into 14 files, here’s a dropbox link:

Could the following boot error be relevant?

Initramfs unpacking failed: Decoding failed

Thanks!

1 Like

Any insight? Are there any other resources about nvidia boot behaviour in ubuntu 20?

1 Like

I believe I ran in to the same issue when using a NUC with a Razer Core X/2080Ti on Ubuntu 20.04. The issue for me was that Xorg was unloading the nvidia driver because I had not set the AllowExternalGpus option in my xorg config file.

Check if you’re running into that issue by looking at the Xorg.0.log (or something similar) within the /var/log directory. If its the lack of that flag you’ll probably see a message somewhere in there about it.

Another user had a similar issue with that intitramfs error.
did you try: update-initramfs -c -k all ?

Thanks!! Adding the AllowExternalGpus to the Xorg config did indeed solve the boot issue.

Before this fix, I had noticed that I could boot properly without the eGPU, login to desktop using the internal intel GPU, then hot-plug the eGPU after logging in, and it would work as expected for cuda applications (desktop environment would still run on intel). Hot-unplug did not (and still does not) work, as expected/mentioned in the nvidia driver notes.

After the fix, I can now reboot safely with the eGPU plugged in. However it prefers the eGPU for rendering the desktop environment, how can I prioritize the integrated intel GPU for this? The prime-select command doesn’t seem to do anything, based on intel_gpu_top and the Xorg processes/power usage listed by nvidia-smi.

I had not run this manually (although I think apt was already triggering updates to initramfs after related installs); running this did not resolve the error message from journalctl.

I also tested to see if hot-plug (not hot-unplug) still works. Not unsurprisingly it does not, it enters an unstable state where the GPU is not even detected by lspci -nnv | grep -A 18 'VGA' or IOMMU group lists. Even between reboots, the GPU is no longer detected (although thunderbolt and the Akitio Node box still is). However, commenting out the AllowExternalGpus and rebooting (with the eGPU on) into the original black screen seems to do some kind of reset; since using SSH, I can then detect the eGPU and run cuda or nvidia-smi. If I then shutdown, then turn off the eGPU, and then turn on the NUC, login, and then turn on the eGPU (hot-plug), this brings it back to the original state.

So it looks like I can either boot with the eGPU on or hot-plug the eGPU after boot, but not both. Advantage of the hot-plug method is that the desktop renders with the intel GPU; advantage of the boot method is that it’s easier to use, and reboot, remotely.