NVIDIA Driver 384.59 Not Installing - Can't load nvidia-drm, can't open display

I’m trying to install the 384.59 driver on an Ubuntu 16.04 OS, but it’s failing to install. The issues seem to be:

  1. From nvidia-installer.log:
    ERROR: Unable to load the ‘nvidia-drm’ kernel module.
    ERROR: Installation has failed.

  2. From nvidia-bug-report.log


xset -q:
xset: unable to open display “”


nvidia-settings -q all:
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
ERROR: Unable to find display on any available system
ERROR: Unable to find display on any available system


xrandr --verbose:
Can’t open display :0


Running window manager properties:
Unable to detect window manager properties


I just can’t figure out what to do about these messages, or if I’m even looking in the wrong place.

nvidia-installer.log (2.67 KB)
nvidia-bug-report.log.gz (57.5 KB)

The initial problem wasn’t the driver but the kernel and gcc with retpoline patches.
Then you installed a 384.59 .run over the 384.111 from ubuntu packages. This has to be fixed first.
Do an uninstall with the .59 .run installer you used using the --uninstall option. Then reinstall mesa and the driver from ubuntu packages
sudo apt install --reinstall libgl1-mesa-glx nvidia-384
Then purge and reinstall the -116 kernel:
https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/1750937

Thanks, but I’m afraid that didn’t do the trick. The commands I used were:

sudo ./NVIDIA-Linux-x86_64-384.59.run --uninstall
sudo apt-get install reinstall libgl1-mesa-glx nvidia-384
sudo apt-get purge linux-image-4.4.0-116-generic
sudo apt-get install linux-image-4.4.0-116-generic

There was no seeming change in the boot, but I lost internet connectivity.

I am now booting into my 112 kernel, so that I can keep access to my files via my LAN. Oddly, when I reach the login screen, it gives me the default Ubuntu screen, instead of my background screen. Then, when I login, I go to my desktop, but cannot access the launcher and windows that I manage to open don’t close… Does this make any sense?

You may also need to recompile the nvidia modules for -116 kernel.

sudo dkms remove nvidia-384/384.111 -k 4.4.0-116-generic
sudo dkms install nvidia-384/384.111 -k 4.4.0-116-generic

Please attach a new nvidia-bug-report.

I tried this, but things still aren’t working. I’m attaching the bug report (thanks):
nvidia-bug-report.log.gz (121 KB)

Kernel/driver/Xorg seems to be fine according to the logs. Maybe OpenGL broke. Use

sudo apt-get install --reinstall libgl1-mesa-glx libgl1-mesa-dri
sudo dpkg-reconfigure xserver-xorg
sudo apt-get install mesa-utils

to get things straight, reboot and post the output of

glxinfo | grep Open

Well, it still has the login loop (i.e. default Ubuntu background for login screen. When password entered screen goes to an empty desktop, then returns to login screen)

The result of glxinfo | grep open is:

Error: unable to open display

Maybe some permission problem, anything in journal?

Also, please post the output of
ls -l /dev/nv*

Here’s the output from
ls -l /dev/nv*

crw-rw-rw- 1 root root 195,   0 Mar 18 18:03 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Mar 18 18:03 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Mar 18 18:03 /dev/nvidia-modeset
crw------- 1 root root 248,   0 Mar 18 18:03 /dev/nvme0
brw-rw---- 1 root disk 259,   0 Mar 18 18:03 /dev/nvme0n1
brw-rw---- 1 root disk 259,   1 Mar 18 18:03 /dev/nvme0n1p1
brw-rw---- 1 root disk 259,   2 Mar 18 18:03 /dev/nvme0n1p2
brw-rw---- 1 root disk 259,   3 Mar 18 18:03 /dev/nvme0n1p3

I don’t know how/which journal I should be checking.

Use
sudo journalctl -b 0
to see the system log from current boot.

deleted

Generix, Thanks for keeping up with this. Here’s the journal output:
temp.txt (167 KB)

Hello Generix,

I also had the installation problem with the retpoline patches.
I edited the kernel headers just to fix the magic number problem, when the NVidia module is inserted but of course it is only a workaround.
I posted this workaround here:
https://devtalk.nvidia.com/default/topic/1030325/nvidia-driver-installation-v387-26-on-ubuntu-16-04/

Do you know what exactly is causing the trouble ?
I saw my Kernel (Ubuntu 3.13-143) is using the retpoline patches.
I am not sure about the gcc compiler.
I am using Nvidia driver 387-34, which ran fine some weeks ago before my system was updated.
In my understanding the Kernel is using retpoline but the new compiled kernel module is not using it. Why this is the case is not clear to me. Maybe the compiler is not up to date (I was using gcc from Ubuntu 4.8.5-2 for the NVidia module).

BR
Sven Grundmann

Sven, the reason is that the -116 kernel has been compiled with SpectreV2 mitigation (retpoline). So out-of-tree modules like nvidia or virtualbox drivers have to be compiled with that same gcc, too. Unfortunately, the Ubuntu update process had a flaw there, so that the kernel (and OOT modules) got updated first and the new gcc only came in later, so the already compiled modules don’t load. Sometimes it even seemed like the new gcc not being delivered at all.
So in case this issue is hit, better do a system update again to be sure to have the new gcc and then rebuild the modules, either by purge/reinstall of the kernel or issuing the commands from post #4 for nvidia alone or use sudo /usr/lib/dkms/dkms_autoinstaller start to rebuild all modules for the running kernel.

smg628,I’m not sure what’s happening on your system. The nvidia-bug-report logs show that the kernel, driver and Xserver are starting just fine. No errors in journal either that would explain anything.
Please try to login and afterwards attach the file ‘.xsession-errors’ of your home directory to your post. Please also describe again what exactly you’re seeing. Also tell if you’re using some kind of special DM/DE of just the normal lightdm/Unity combo.

Generix, I’m attaching the xsessions-errors file and logs from lightdm. Now, for ALL details.

I have 2 kernels - 116 & 112.

The problem started after I kicked a power strip and got a quick off/on with my machine. Since I still have a lot of functionality, I didn’t think there was a hardware issue, now I’m less certain…

The two kernels show different behavior, though neither one works.

Kernel 116:

Boots into login screen with default Ubuntu background. After entering password, goes to pure default Ubuntu background as though going to desktop, but then returns to login screen.

Kernel 112:

Boots into login screen with default Ubuntu background. After entering password, opens up desktop, with my background, but:

  1. No access to the application launcher
  2. When i double click on an icon on the desktop, it opens, but the window lacks the close/hide/full_screen control buttons
    xsessions-errors.txt (878 Bytes)
    seat0-greeter.log (6.34 KB)
    x-0.log (1 KB)
    lightdm.log (5.69 KB)

.xsession-errors is needed when booting the -112 kernel. Is the file you attached from a -116 boot?

Please also post the output of
find /usr/ -name “libGL*” -exec ls -l {} ;

Here are the xsessions-errors from the 112 kernel and a list of my libGL’s
xsession-errors-112.txt (878 Bytes)
libGLlist.txt (5.94 KB)