NVidia driver 520.61.05 / Cuda 11.8 / RTX 3090 = black display and superslow modesets

Tomas.nordstrom · October 8, 2022, 9:49am

After installing the latest cuda packages on my Ubuntu (5.15.0-48-generic) the graphics no longer works as it should, and the card became useless as a computing tool.
Symptoms:

Black screen, seems to to have problems with mode set
part of dmesg:

[   31.888468] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67d:0:0:1120
[   39.815112] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1129
[   47.762125] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:1:0:1129
[   87.802420] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device AUS VG27B (HDMI-0)

Superslow nvidia-smi (actually gives a result but only after more than 10 s)
nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 49%   63C    P0   121W / 350W |     17MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Also note high power draw for no running processes!

xorg also seems to be spinning full throttle without generation of something useful:
inxi -t c
Processes:
CPU top: 5 of 356
1: cpu: 97.7% command: xorg pid: 4279

System Info “inxi --admin --verbosity=7 --filter --no-host --width”, edited for brevity
System:
Kernel: 5.15.0-48-generic x86_64 bits: 64 compiler: gcc v: 11.2.0
parameters: BOOT_IMAGE=/boot/vmlinuz-5.15.0-48-generic
root=UUID=NNN ro quiet splash
vt.handoff=7
Console: pty pts/2 DM: GDM3 42.0
Distro: Ubuntu 22.04.1 LTS (Jammy Jellyfish)
Machine:
Type: Desktop System: ASUS product: N/A v: N/A serial:
Mobo: ASUSTeK model: ROG STRIX Z490-A GAMING v: Rev 1.xx
serial: UEFI: American Megatrends v: 2403
date: 10/27/2021
Memory:
RAM: total: 62.71 GiB used: 1.91 GiB (3.1%)
CPU:
Info: model: Intel Core i9-10900KF bits: 64 type: MT MCP arch: Comet Lake
family: 6 model-id: 0xA5 (165) stepping: 5 microcode: 0xF0
Graphics:
Device-1: NVIDIA GA102 [GeForce RTX 3090] vendor: Micro-Star MSI
driver: nvidia v: 520.61.05 alternate: nvidiafb,nouveau,nvidia_drm pcie:
gen: 3 speed: 8 GT/s lanes: 16 link-max: gen: 4 speed: 16 GT/s ports:
active: none off: HDMI-A-1 empty: DP-1,DP-2,DP-3 bus-ID: 01:00.0
chip-ID: 10de:2204 class-ID: 0300
Display: server: X.org v: 1.21.1.3 with: Xwayland v: 22.1.1 driver: X:
loaded: nouveau,vesa unloaded: fbdev,modesetting gpu: nvidia tty: 100x42

If I remove all nvidia/cuda stuff and then install nvidia-driver-510-server (Driver Version: 510.85.02) then the graphics come back.

However, it then now fails to get Tensorflow (2.11.0-dev20221005) to use the GPU due to “Could not load dynamic library ‘libnvinfer.so.7’”, and I am not able to get all the needed libraries together to get it back into the functioning state without apt dragging in the new nvidia drivers again.

I have attached a bug-report (which takes forever to complete, again probably because the modeset is slow or timing out before continuing)
nvidia-bug-report.log.gz (440.4 KB)

jwkb · October 10, 2022, 11:12am

In preparation for testing the next generation RTX4090 I upgraded to CUDA11.8 on two workstations with similar FATAL RESULT:
Black display - noticed with either Ubuntu 20.04 with RTX3090 as well with 22.04 and RTX3080Ti.

Both systems runnning kernel 5.15.0-48-generic

The workstation with Ubuntu 22.04 sometines rejects SSH-connects, top is showing 100% load of Nvida+, then Xorg and 2 minutes later of plymouthd.

NVIDIA - PLEASE FIX THIS ASAP!

Tomas.nordstrom · October 10, 2022, 7:48pm

This is probably the same problem as Black X11 Screen and partial lockup when upgraded to 515.76 on RTX3060 .

The suggested temporary solution to switch to DisplayPort is working. And it seems that we will get a fix in the next update!

jwkb · October 11, 2022, 9:58am

Thanks, but still no success.
I just tried to switch from HDMI to Displayport on both systems, 20.04 and 22.04, still black display, via SSH top shows 100% load of Xorg even 10 min. after reboot.

hugh.winkler · October 12, 2022, 4:09pm

Just wanted to add “me too”. This is on a clean, fresh Ubuntu 22.04 with RTX 3090. . CUDA 11.7/ 515.65.01 works perfectly. CUDA 11.8/520 fails to boot as described in your post.

amrits · October 12, 2022, 4:52pm

Hi All,
We are aware of this issue and it has been root caused.
Fix is integrated for future release driver.

Quesar · October 18, 2022, 9:20pm

Same issue with an A6000

remi.coulom1 · October 23, 2022, 1:12pm

I have the same problem, and found it a little hard to revert to the old version. Here are the command that saved me:

sudo apt-get purge nvidia*
sudo apt-get install cuda=11.7.1-1 cuda-drivers=515.65.07-1 libcudnn8-dev=8.5.0.96-1+cuda11.7 libcudnn8=8.5.0.96-1+cuda11.7

I found some help in that post:

The key to find the old version name is to use “apt list -a cuda”, “apt list -a libcudnn8”.

hugh.winkler · November 5, 2022, 2:14am

amrits, can you describe the workaround so we can install 11.8? The current deb install isn’t just unusable, it makes systems unbootable. Seems like a it should be a high priority hotfix, and in the meantime a workaround procedure.

zhouhang188 · November 5, 2022, 7:48am

Hey there, my walk around solution is to install nvidia driver 520.56 first. When installing CUDA 11.8, follow every steps but change the very last step to sudo apt-get install nvidia-cuda-toolkit. This does not erase your local driver and prevent driver crash.

Other other way is to use Nvidia-docker released by Nvidia. In this case you dont need to install CUDA but only the nvidia driver. The Pytorch docker which includes CUDA can be found at PyTorch | NVIDIA NGC

jwkb · November 7, 2022, 9:30am

Hi Nvidia, don’t you think it is time for fixing this? What are you waiting for? My two systems are in completely unusable mode, and I don’t want to waste more time with workarounds…

evdaccs · November 20, 2022, 5:55am

When is Nvidia going to fix this problem?

amrits · November 20, 2022, 7:00am

@evdaccs @jwkb
Please confirm if you tried with driver 520.56.06.
This driver fixed issue for users reported on another thread.
[Bug Report] Black X11 Screen and partial lockup when upgraded to 515.76 and dual RTX3060 - Graphics / Linux / Linux - NVIDIA Developer Forums

evdaccs · November 21, 2022, 2:58pm

@amrits

Please confirm if you tried with driver 520.56.06.
This driver fixed issue for users reported on another thread.
[Bug Report] Black X11 Screen and partial lockup when upgraded to 515.76 and dual RTX3060 - Graphics / Linux / Linux - NVIDIA Developer Forums

Thank you for your prompt response!

It looks like you shared a workaround, which I appreciate to become unblocked, but I do not think this can be considered a fix.

When is Nvidia going to fix this problem?

In other words, when can users expect to install cuda from official package sources and not have breakage?

sudo apt install cuda

Thanks again for your support!

jwkb · November 21, 2022, 3:19pm

@evdaccs Thank you very much, you are 100% right!

@amrits After 7 years successful use of CUDA on a couple of machines on our site, this is now the worst experience ever. And Nvidia is 100% responsible for this mess.
REMINDER: You announced on 10th of october - Issue has been root caused and fix is integrated in future release driver.
Why don’t you just build it and release it?

evdaccs · November 21, 2022, 4:19pm

Please confirm if you tried with driver 520.56.06.
This driver fixed issue for users reported on another thread.

Wasn’t able to install this driver to even unblock myself by the way.
It’s not available.

What is the expectation on users here?

Here’s Nvidia’s cuda repo (focal):
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/

Where is driver 520.56.06 ?

evdaccs · November 21, 2022, 4:59pm

Okay folks,
It’s unknown when Nvidia will fix the problem.

After a few hours, I found downgrading through their archive to a functioning toolkit works.

So here is your workaround:

Do NOT attempt to install cuda from Nvidia’s current repo/source index (eg https://developer.nvidia.com/cuda-downloads)
Go to their toolkit archive to download a functioning version: CUDA Toolkit Archive | NVIDIA Developer
- eg: CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer
Do NOT select network installer type
Pick a local version, eg deb (local)
Confirm that the version of cuda repo that it downloads isn’t the latest broken version
- eg: https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
Run the steps, and enjoy a working display again!


wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.1-515.65.01-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

I was able to boot my computer again without needing to disconnect the HDMI cable during startup.

amrits · November 21, 2022, 8:15pm

You can download driver using below link -
https://us.download.nvidia.com/XFree86/Linux-x86_64/520.56.06/NVIDIA-Linux-x86_64-520.56.06.run

Gimel12 · December 2, 2022, 3:51pm

That works fine but is only drivers not cuda, also for 40 series the solution of installing cuda 11.7 doesn’t work because we need 11.8 or higher for ADA GPUs, so I guess we are stuck until CUDA 12.0 is released next year.

Is that right @amrits ?

Thanks

Gimel12 · December 2, 2022, 3:56pm

In order for me to run Pytorch models I had to create a container (cuda 11.8) and then I can use my GPUs 4090s for training or inference, keep in mind that since this GPUs are new and had new SM you may need to recompile Pytorch or other packages to support them, or the code wont run. I hope Nvidia hurry up and released CUDA 12 and all this is solved.

I want to point out as well that we are experiencing restarts when long training sessions with ubuntu 22.04 and new drivers, this is not happening with 20.04.

Thanks

Topic		Replies	Views
'No devices were found' after installing cuda 11.02 on Ubuntu 20.04 for RTX3080 Linux cuda , ubuntu , driver	19	12733	July 31, 2021
Black Screen After install CUDA 10.1 on Ubuntu 18.04 Linux	37	19846	November 30, 2022
Unix graphics fails on boot - Ubuntu 21.10 Linux ubuntu	17	869	July 4, 2022
[Bug Report] Black X11 Screen and partial lockup when upgraded to 515.76 and dual RTX3060 Linux	29	7767	December 21, 2022
Ubuntu MATE 20.04 with RTX 3070 on Ryzen 5900 - black screen after boot Linux	117	21712	October 12, 2021
Black screen after install of nvidia driver ubuntu Linux	224	161150	February 27, 2025
Black screen installing CUDA on Ubuntu 22.04.1 LTS CUDA Setup and Installation	3	1475	November 26, 2022
Ubuntu 22.04.1 Nvidia Driver (Open Kernel) Nvidia-Driver-515-Open Issue Linux kernel	14	23118	November 19, 2022
Upgrade from Ubuntu 18 to 20 messed up graphics drivers Linux	18	3564	January 31, 2022
Black screen with nvidia turned on instead intel and Failed to initialize NVML: Driver/library version mismatch Linux cuda , ubuntu	17	8711	April 12, 2022

NVidia driver 520.61.05 / Cuda 11.8 / RTX 3090 = black display and superslow modesets

Related topics