Black screen in Ubuntu 18 even after purging Nvidia and installing drivers from repository

This was the sequence I followed to update the Nvidia drivers (previously NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1) on my REMOTE Ubuntu 18.04 workstation:

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo ubuntu-drivers --gpgpu nvidia install

When I log in with TeamViewer, the screen is completely black.

Did I do anything wrong? How do I fix this?

I can ssh into the machine and nvidia-smi shows the following:

$ nvidia-smi

Fri Nov 17 01:46:05 2023
Ā±----------------------------------------------------------------------------+ | NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 | |-------------------------------Ā±---------------------Ā±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ā€¦ Off | 00000000:0A:00.0 Off | N/A | | 30% 33C P8 21W / 250W | 25MiB / 11019MiB | 0% Default | | | | N/A | Ā±------------------------------Ā±---------------------Ā±---------------------+ | 1 NVIDIA GeForce ā€¦ Off | 00000000:41:00.0 On | N/A | | 33% 54C P0 70W / 250W | 55MiB / 11018MiB | 0% Default | | | | N/A | Ā±------------------------------Ā±---------------------Ā±---------------------+

Ā±----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3353 G /usr/lib/xorg/Xorg 16MiB | | 0 N/A N/A 5414 G /usr/bin/gnome-shell 6MiB | | 1 N/A N/A 3353 G /usr/lib/xorg/Xorg 53MiB | Ā±----------------------------------------------------------------------------+

$ lsmod | grep nvidia

nvidia_uvm 1015808 0 nvidia_drm 57344 6 nvidia_modeset 1196032 5 nvidia_drm nvidia 35446784 274 nvidia_uvm,nvidia_modeset drm_kms_helper 172032 1 nvidia_drm drm 401408 10 drm_kms_helper,nvidia,nvidia_drm

Any ideas whatā€™s going on? Thank you!

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

1 Like

I continued making attempts to fix things, so I purged Nvidia and eded up installing driver version 510 (since 470 has a terrible reputation in online forums). Teamviewer got worse: the app on my laptop now doesnā€™t even detect the Teamviewer on the Ubuntu 18 workstation being connected. I updated Teamviewer and launched it from the command line but itā€™s still not showing up as an active connection from my laptop.
[Fortunately, I can still access the workstation via SSH going through another machine; somehow, my laptop canā€™t SSh directly into the workstation either, but I can get to it through another workstationā€¦].

Hereā€™s the nvidia-bug-report.log.gz attached.
nvidia-bug-report.log.gz (407.4 KB)

The driver is working, an Xserver is running on the two nvidia gpus. Though I canā€™t see any WM/DE/DM. Once you had gdm running, but that seems to be gone. What did you set now as your DM? Furthermore, thereā€™s a kernel parameter"=nomodeset", please remove and set nvidia-drm.modeset=1 instead.

1 Like

Yeah, before purging nvidia the DM was gdm.
The purge basically consisted of the following:

Purge NVIDIA packages

sudo apt purge $(dpkg -l | grep nvidia | awk ā€˜{print $2}ā€™)

Purge CUDA packages

sudo apt purge $(dpkg -l | grep cuda | awk ā€˜{print $2}ā€™)

Purge libcudnn packages

sudo apt purge $(dpkg -l | grep libcudnn | awk ā€˜{print $2}ā€™)

Purge libglvnd0 packages

sudo apt purge $(dpkg -l | grep libglvnd0 | awk ā€˜{print $2}ā€™)

Autoremove to remove any unnecessary dependencies

sudo apt autoremove

After reinstalling things, I got a message prompting me to choose a DM, but gdm was not among the options, so I chose lightdm.

I can verify itā€™s lightdm by running:
$ cat /etc/X11/default-display-manager
/usr/sbin/lightdm

I also ran nvidia-xconfig when I installed version nivida-driver-470 that initially broke things, but not after installing nvidia-driver-510. Mentioning it in case the first broke xorg somehow. I do get this weird warning when SSH-ing into another workstation with ssh -xy, then from there SSH-ing into the broken workstation with ssh -xy as well and transferring files back using rsync -avr:

ā€œWarning: No xauth data; using fake authentication data for X11 forwarding.ā€

I do have an ~/.Xauthority file, but when I print the contents with ā€˜catā€™ from the terminal, they look a bit weird:

$ cat ~/.Xauthority
jgalazlinux11MIT-MAGIC-COOKIE-1+??Ó„?n.
??.a?
jgalazlinux12MIT-MAGIC-COOKIE-1
11MIT-MAGIC-COOKIE-1W?f?r??Š»x =?F?2@3F?y9Y??13MIT-MAGIC-COOKIE-1AŪ‘?}???Ū?M12MIT-MAGIC-COOKIE-1A/Š²a?+D?t)?
kK2?10MIT-MAGIC-COOKIE-1?!???W??g?GĖ¢??F
jgalazlinux10MIT-MAGIC-COOKIE-1?ā€˜?ā€™?n??i??jx

Q1) Should I overwrite the contents of this file with ā€œxauth generate :0 .ā€ as suggested by some online forums?

Regarding ā€œnomodesetā€, I set that by modifying the variable GRUB_CMDLINE_LINUX_DEFAULT=ā€œnomodesetā€ in /etc/default/grub, which was previously empty (some site suggested preventing nvidia from going into sleep or standby mode or something like that by disabling these functions, but I only did that when I had the 470 version of the driver, not for 510).

I see thereā€™s another variable that is empty in /etc/default/grub, GRUB_CMDLINE_LINUX=ā€œā€, though Iā€™m not sure what the difference is exactly (something about setting the mode for all kerns vs the default kernel?).

Further searching online, it seems that people sometimes set the equivalent of ā€œnvidia-drm.modeset=1ā€ by creating a /etc/modprobe.d/nvidia.conf file containing the line ā€œoptions nvidia-drm modeset=1ā€.

Q2) Whatā€™s the best (and safest) place to set this variable?

I opted to try the latter option (/etc/modprobe.d/nvidia.conf) and rebooted, but Teamviewer on the remote workstation still seems to be offline when I try to access it from the app on my laptop.

I get this security message when logging in via SSH (probably unrelated, but pasting it just in case; also attaching an updated nvidia-bug-report.log.gz
nvidia-bug-report_2.log.gz (401.9 KB)
):

ā€œExpanded Security Maintenance for Infrastructure is not enabled.
0 updates can be applied immediately.
178 additional security updates can be applied with ESM Infra.
Learn more about enabling ESM Infra service for Ubuntu 18.04 at
https://ubuntu.com/18-04ā€

Q3) Should I enable ESM Infra?

I also set GRUB_CMDLINE_LINUX_DEFAULT=ā€œnvidia-drm.modeset=1ā€, as suggested. No different. Iā€™m attaching a nvidia-bug-report.sh generated after this as well.
nvidia-bug-report_3.log.gz (390.3 KB)

The magic cookie is binary, i.e. contains non-printable characters. So itā€™s expected to get weird output if catā€™ed. The log from when you created an xorg.conf were in the logs, there was a path missing so it didnā€™t find the nvidia module. Maybe your Teamviewer is outdated, one change between driver versions 430 and 470 was adding the nvidia output sink feature so since you have two gpus, theyā€™re working differently now. To mimic the previous behaviour, try creating /etc/X11/xorg.conf only containing

Section "Files"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg,/usr/lib/xorg/modules"
EndSection

Section "Serverflags"
    Option "AutoAddGPU" "false"
EndSection

Section "Device"
    Identifier "Device0"
    Driver "nvidia"
    BusID "PCI:65:0:0"
    Option "ProbeAllGpus" "false"
EndSection

Regarding ESM, Ubuntu 18.04 is out of public support for more than a year now, ESM is prolonged, paid support.

1 Like

I did sign up for Ubuntu Pro, which provides prolonged security support for Ubuntu 18.04 for up to 5 machines for free, but that didnā€™t seem to make a difference (I think it got removed with the purge described above in between installing 470 and 510 versions of the Nvidia drivers).

From the TeamViewer website, it seems that the last version is 15.47.3 (Linux), which is exactly what Iā€™m running on the workstation that is misbehaving:

~$ teamviewer version

  • TeamViewer 15.47.3 (DEB)*

TeamViewer seems to be running fine, I think?

$ teamviewer daemon status

systemctl status teamviewerd.service
ā— teamviewerd.service - TeamViewer remote control daemon

  • Loaded: loaded (/etc/systemd/system/teamviewerd.service; enabled; vendor preset: enabled)*
  • Active: active (running) since Mon 2023-11-20 13:15:32 PST; 24min ago*
  • Process: 2986 ExecStart=/opt/teamviewer/tv_bin/teamviewerd -d (code=exited, status=0/SUCCESS)*
  • Main PID: 3031 (teamviewerd)*
  • Tasks: 77 (limit: 19660)*
  • CGroup: /system.slice/teamviewerd.service*
  •       ā””ā”€3031 /opt/teamviewer/tv_bin/teamviewerd -d*
    

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

[No idea about the ā€œJournalā€ warningā€¦].

I updated Teamviewer after installing nvidia-driver-470, hoping that would fix the black screen, but it didnā€™t; on the contrary, since the update the workstation appears as offline in the Teamviewer app on my MacBook (I also canā€™t access it via SSH directly anymore, and have to hop in to it from another machine in the workstationā€™s network, but that may be related to an antivirus I installed which was required to try to set up VPN accessā€¦; now Iā€™m not sure how to get rid of it for SSH to work directly again).

I had xorg.conf.backup, xorg.conf.backup2, and xorg.conf.nvidia-xconfig-original files in /etc/X11, no xorg.conf.

I created xorg.conf with the suggested content and rebooted.
TeamViewer on my MacBook still isnā€™t detecting the workstation as connected (do I need to do anything on the Mac side?).

Iā€™m attaching an updated nvidia-bug-report.sh:
nvidia-bug-report_4.log.gz (391.7 KB)

If you canā€™t directly ssh into the system, thereā€™s likely something blocking acces in the network, so teamviewer wonā€™t work either.

I think the the SSH problem is unrelated as it happened several weeks earlier.
I was trying to set up a VPN so that I didnā€™t have to use TeamViewer (itā€™s disfavored at my company, and also expensive, as I have to pay myself for the version that allows me to connect to multiple machines).
I also tried AnyDesk.
Both TeamViewer and AnyDesk worked perfectly fine even after SSH stopped working; then I updated the Nvidia driver from 430 to 470 and both TeamViewer and AnyDesk started showing a black screen.
I then purged everything as described above, and updated TeamViewer, to no avail :-(

Already checked teamviewer logs?
https://community.teamviewer.com/English/kb/articles/4694-find-your-log-files

1 Like

Thank you for the tip! I generated the logs this morning (attaching them here), but there are so many files, not sure what to look for.

tvlog_jgalazlinux_2023-11-22_su_jgalaz.zip (17.5 MB)

In the Xauth log, I found this potentially suspicious? (mentions ā€œerrorā€)

www.ubuntu.com/support)
[ 19.036] Current version of pixman: 0.34.0
[ 19.036] Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
[ 19.036] Markers: (ā€“) probed, () from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[ 19.036] (==) Log file: ā€œ/var/log/Xorg.0.logā€, Time: Tue Nov 21 18:58:38 2023
[ 19.037] (==) Using config file: ā€œ/etc/X11/xorg.confā€
[ 19.037] (==) Using system config directory ā€œ/usr/share/X11/xorg.conf.dā€
[ 19.038] (==) No Layout section. Using the first Screen section.
[ 19.038] (==) No screen section available. Using defaults.
[ 19.038] (
) |ā€“>Screen ā€œDefault Screen Sectionā€ (0)
[ 19.038] () | |ā€“>Monitor ā€œā€
[ 19.038] (==) No device specified for screen ā€œDefault Screen Sectionā€.
Using the first device section listed.
[ 19.038] (
) | |ā€“>Device ā€œDevice0ā€
[ 19.038] (==) No monitor specified for screen ā€œDefault Screen Sectionā€.
Using a default monitor configuration.
[ 19.038] () Option ā€œAutoAddGPUā€ ā€œfalseā€
[ 19.038] (==) Automatically adding devices
[ 19.038] (==) Automatically enabling devices
[ 19.038] (
) Not automatically adding GPU devices
[ 19.038] (==) Automatically binding GPU devices
[ 19.039] (==) Max clients allowed: 256, resource mask: 0x1fffff

Then thereā€™s the crash logs, which are very long, but hereā€™s the last bit (mentions ā€œabortā€)

Thread 1 (Thread 0x7f9129f9b700 (LWP 10024)):
#0 0x00007f91434a853f in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
No symbol table info available.
#1 0x00007f91434aa098 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
No symbol table info available.
#2 0x00007f9165343fc8 in __GI___backtrace (array=, size=) at ā€¦/sysdeps/x86_64/backtrace.c:111
arg = {array = 0xe5aec0, cfa = 140261451213216, cnt = 23, size = 200}
once = 2
#3 0x000000000057754d in ?? ()
No symbol table info available.
#4 0x0000000000575c60 in ?? ()
No symbol table info available.
#5 0x00000000005b0ab9 in ?? ()
No symbol table info available.
#6
No locals.
#7 0x00007f91434a853f in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
No symbol table info available.
#8 0x00007f91434aa098 in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
No symbol table info available.
#9 0x00007f9165343fc8 in __GI___backtrace (array=, size=) at ā€¦/sysdeps/x86_64/backtrace.c:111
arg = {array = 0xe5aec0, cfa = 140261451213216, cnt = 16, size = 200}
once = 2
#10 0x000000000057754d in ?? ()
No symbol table info available.
#11 0x0000000000575c60 in ?? ()
No symbol table info available.
#12 0x00000000005b0ab9 in ?? ()
No symbol table info available.
#13
No locals.
#14 __GI_raise (sig=sig@entry=6) at ā€¦/sysdeps/unix/sysv/linux/raise.c:51
set = {__val = {0, 9042521604759584, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 72057594037993472, 0, 7310587295074448245, 41827680}}
pid =
tid =
ret =
#15 0x00007f91652537f1 in __GI_abort () at abort.c:79
save_stage = 1
act = {__sigaction_handler = {sa_handler = 0xd68, sa_sigaction = 0xd68}, sa_mask = {__val = {140262447773312, 45, 3432, 11587611, 140262444231375, 45, 16125368492513863015, 0, 140262447773312, 11587611, 45, 140262447755936, 140262447789832, 41657568, 140262444180063, 0}}, sa_flags = 15124704, sa_restorer = 0xe6c8e0 }
sigs = {__val = {32, 0 <repeats 15 times>}}
__cnt =
__set =
__cnt =
__set =
#16 0x0000000000abe3ce in ?? ()
No symbol table info available.
#17 0x0000000000abd567 in ?? ()
No symbol table info available.
#18 0x0000000000abd58f in ?? ()
No symbol table info available.
#19 0x0000000000856e6c in ?? ()
No symbol table info available.
#20 0x0000000000aac903 in ?? ()
No symbol table info available.
#21 0x0000000000aac941 in ?? ()
No symbol table info available.
#22 0x0000000000aac96c in ?? ()
No symbol table info available.
#23 0x00007f9165256031 in __run_exit_handlers (status=1, listp=0x7f91655fe718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
atfct =
onfct =
cxafct =
f =
new_exitfn_called = 2094
cur = 0x27ba4e0
#24 0x00007f916525612a in __GI_exit (status=) at exit.c:139
No locals.
#25 0x00000000005baa99 in ?? ()
No symbol table info available.
#26 0x00007f9141d74a5e in ?? ()
No symbol table info available.
#27 0x00000000029ac570 in ?? ()
No symbol table info available.
#28 0x00007f9141d722ed in ?? ()
No symbol table info available.
#29 0x00000000029ac570 in ?? ()
No symbol table info available.
#30 0x00007f9141d63ca0 in ?? ()
No symbol table info available.
#31 0x00000000027df2b0 in ?? ()
No symbol table info available.
#32 0x00007f9129f9aae8 in ?? ()
No symbol table info available.
#33 0x0000000000000000 in ?? ()
No symbol table info available.
UpgradeStatus: No upgrade log present (probably fresh install)

is just an informational message that EE means error. There are no EE marked lines in the log the Xserver is running perfectly fine.
I checked the teamviewer logs, when you initially installed the 470 driver, the second nvidia came alive and teamviewer connected to that, while it didnā€™t have any monitor connected, so it used the virtual framebuffer resulting in a black screen 640x480. All you needed to do in that case was setting the kernel parameter nvidia-drm.modeset=1 so both nvidia gpus would have worked togethere. I donā€™t know what you then did on November 17th but since then, teamviewer isnā€™t even trying to connect to the Xserver anymore. I guess you should purge and reinstall teamviewer, likely locally on the host.

I wonā€™t have local access to the host locally likely until January 2024, but I can SSH into it (via another computer).
Will purging and reinstalling TeamViewer via SSH work? (I updated to the latest TeamViewer over SSH, after installing nvidia-driver-470, but before installing 510).

Also, Iā€™ve set GRUB_CMDLINE_LINUX_DEFAULT=ā€œnvidia-drm.modeset=1ā€ /etc/default/grub, and also by creating a /etc/modprobe.d/nvidia.conf file containing the line ā€œoptions nvidia-drm modeset=1ā€.
Is it OK to have it in both places, or is one preferred over the other?

Is it worth it to revert to 470 now that nvidia-drm.modeset=1 is in place?

https://community.teamviewer.com/English/kb/articles/4352-install-teamviewer-classic-on-linux-without-graphical-user-interface

Doesnā€™t matter, I just prefer the kernel cmdline because 1) itā€™s visible in the logs 2) it doesnā€™t require rebuilding the initrd

I wouldnā€™t know why.

1 Like

1) Purged TeamViewer and reinstalled. Seems to have gone well, except for a warning:

$ sudo apt install ./teamviewer_amd64.deb
(etc etc etcā€¦)
Setting up teamviewer (15.48.4) ā€¦
gpg: WARNING: unsafe ownership on homedir ā€˜/home/jgalaz/.gnupgā€™

Is it troublesome? (How to solve it if yes?).

2) After adding the machine as a trusted device again per the link you shared, and rebooting, TeamViewer on my Mac detects the offending workstation again!
However, when I connect to it, the only screen resolution available is extremely low, and the screen looks primitive. When I type in the password to log in, I get an error: ā€œfailed to start sessionā€ (see screenshot attached).

3) Also, when I check the status from the terminal (Iā€™m sshing into another workstation, and then from there into the offending workstation), I get an error:

$ teamviewer status
Initā€¦
No protocol specified
xprop: unable to open display ā€˜:0ā€™
CheckCPU: SSE2 support: yes
Checking setupā€¦
Launching TeamViewer ā€¦
Launching TeamViewer GUI ā€¦
Aborted (core dumped)

4) Attaching updated tvlog here in case it helps; got this message at the very beginning when generating it:
cp: cannot stat ā€˜/opt/teamviewer/logfiles/lightdmā€™: No such file or directory

tvlog_jgalazlinux_2023-11-25_su_jgalaz.zip (16.9 MB)

5) Also puzzled by getting this message when I use rsync to transfer files from the offending workstation to other machines:
Warning: No xauth data; using fake authentication data for X11 forwarding.

Teamviewer canā€™t access the lightdm Xauthority file
XClient[:0]: Unable to open XAuthority file "/var/run/lightdm/root/:0" (err=13)
so it canā€™t connect to the real Xserver and starts a fake one.
Please post the output of
ps a |grep X

1 Like

Looks like youā€™re using a broken xorg.conf again. Please delete all xorg.conf files and use only the one I gave you.

1 Like

Darn, yeah, in one of the iterations I tried after TeamViewer is detecting the workstation again I did restore the old xorg.conf just to see whether that worked. I tried with the one you gave me first though. Iā€™ve restored that now (below), and it has improved resolution! It even detects both monitors (the second one has even better resolution). See screenshot attached:

TeamViewer session still fails to start though.

The output of ps a |grep X is:

3013 tty7 Ssl+ 0:01 /usr/lib/xorg/Xorg -core +iglx :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
16604 pts/0 S+ 0:00 grep --color=auto X

This is in xorg.conf:

$ cat xorg.conf
Section ā€œFilesā€
ModulePath ā€œ/usr/lib/x86_64-linux-gnu/nvidia/xorg,/usr/lib/xorg/modulesā€
EndSection

Section ā€œServerflagsā€
Option ā€œAutoAddGPUā€ ā€œfalseā€
EndSection

Section ā€œDeviceā€
Identifier ā€œDevice0ā€
Driver ā€œnvidiaā€
BusID ā€œPCI:65:0:0ā€
Option ā€œProbeAllGpusā€ ā€œfalseā€
EndSection

So now xorg and teamviewer are fine, donā€™t touch their config. The message ā€œFailed to start sessionā€ points to an issue with gnome. When switching from gdm to lightdm, did you additionally install lightdm or did you remove gdm an accidentally also gnome? Please check lightdm logs.

1 Like

I only did the purge commands in one of the messages above (purged nvidia, cuda, libcudnn, and libglvnd0).
Did not intend to mess with gnome or to switch from gdm to lightdm. In the middle of the purge, the terminal asked me to pick a DM, and gdm was not in the list. The only one I recognized was lightdm, so I picked that.

Tried this, but it didnā€™t work: 16.04 - Login with LightDM fails - Ask Ubuntu

Attaching lightdm log:
lightdm.log (18.5 KB)

Per this discussion, [SOLVED] 18.04 LTS - GNOME - Failed to start session, tempted to try:
sudo apt-get install --reinstall gnome-session
sudo apt-get install --reinstall ubuntu-desktop

But am a bit wary about breaking more things in trying to fix things, so Iā€™ll wait for the green light from you, or for the better advice you may provide.
Thanks a lot for all your help!