GeForce RTX 2070 not working properly with ubuntu 22.04 on kernel 5.15

georg9alem · November 25, 2023, 3:13pm

I’ve recently faced quite some issues with my NVIDIA GeForce RTX 2070 Super with Max-Q Design & linux kernel 5.15.0-89-generic.

It started when I somehow upgraded my kernel to 5.15.0-89-generic, while still on ubuntu 20.04. My system would frequently freeze, and be slow to shutdown or to start. Before the kernel update, I was using driver version 470 with toolkit 11.4, and everything was working as expected.

After sort of isolating the issue down to the graphics card (e.g. no problems noticed when fully switching to internal intel GPU with prime-select) I decided to upgrade ubuntu altogether.

On 22.04 I’ve been running on driver version 525 & toolkit 12.0. This seems to have fixed my issues up to a degree. While my system hasn’t completely frozen yet (instead it freezes few milliseconds now & then), it is still slow to start or shutdown (sometimes I have to shutdown by going into tty or at other times only alt + PrtSc + REISUB will help), and sometimes the GPU even crashes altogether (meaning that the nvidia driver/tookit crash).

I also experimented with kernel 6.2.0-37-generic but this was even worse. First of all, only the latest drivers (545) & toolkit (12.3) would work at all. But then anytime I gave any serious workload to the GPU it would simply crash. Basically not usable at all.

Important to note that I also have an external 3090 RTX GPU, which, however, hasn’t exhibited any of the issues that 2070 has.

So I guess what I’m looking for is some sort of confirmation that the above reasoning makes sense, and it indeed could be a driver/tookit issue with RTX 2070. Are there some other driver/toolkit combos I could still try for more stability with ubuntu 22.04 with kernel 5.15?

I’m also a bit worried that this could be a physical issue with the GPU itself. Any tests I could run for that? If needed can provide nvidia-bug-report.sh logs.

generix · November 26, 2023, 2:51pm

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

georg9alem · November 26, 2023, 3:10pm

Thanks for the response. Attached it.

nvidia-bug-report.log.gz (1.7 MB)

generix · November 26, 2023, 3:23pm

Actually, I can only see recurring issues with the intel igpu:

[ 8257.412209] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
[ 8257.412285] [drm:lspcon_resume [i915]] *ERROR* LSPCON init failed on port D

georg9alem · November 26, 2023, 4:19pm

Yeah, true. I’ve noticed that one for quite some time now, when the system boots. Anything you’d recommend for that?

Perhaps I re-share the bug report when the said issues re-occur. I’m not sure how far back the logs are collected in this report or if they somehow got cleared.

E.g. one of the errors that I kept note of was: nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c57d:0 2:0:4048:4040

This happened when the driver crashed (saying cannot find GPU or sth like that) during the workload, and I rebooted the laptop.

generix · November 26, 2023, 7:32pm

Then create a new log when it happens thhe next time.

georg9alem · November 27, 2023, 6:37pm

Hi @generix ,

created a new one, please find it attached. Some errors:

[nvidia-bug-report.log.gz|attachment](upload://u56GIljqMTf2xe9YryAeEJZHYgl.gz) (718.9 KB)

or

*** /var/lib/dkms/nvidia/525.147.05/5.15.0-89-generic/x86_64/log/make.log
*** ls: -rw-r--r-- 1 root root 1147548 2023-11-24 23:52:35.344875736 +0100 /var/lib/dkms/nvidia/525.147.05/5.15.0-89-generic/x86_64/log/make.log
DKMS make.log for nvidia-525.147.05 for kernel 5.15.0-89-generic (x86_64)
Fr 24. Nov 23:52:10 CET 2023
make[1]: Entering directory '/usr/src/linux-headers-5.15.0-89-generic'
test -e include/generated/autoconf.h -a -e include/config/auto.conf || (		\
echo >&2;							\
echo >&2 "  ERROR: Kernel configuration is invalid.";		\
echo >&2 "         include/generated/autoconf.h or include/config/auto.conf are missing.";\
echo >&2 "         Run 'make oldconfig && make prepare' on kernel src to fix it.";	\
echo >&2 ;							\
/bin/false)
make -f ./scripts/Makefile.build obj=/var/lib/dkms/nvidia/525.147.05/build \

But these two errors were there even in the initial logs.

Is ubuntu 22.04 running on kernel 5.15.0-89-generic actually compatible with nvidia driver 525.147.05 & cuda 12.0?

nvidia-bug-report.log.gz (718.9 KB)

generix · November 27, 2023, 7:09pm

Again, no nvidia related erros in the log.
The error you see from compile is always displayed, ignore.
This rather looks like an intel i915 bug:
https://gitlab.freedesktop.org/drm/intel/-/issues/4458

Yes.

georg9alem · November 27, 2023, 8:36pm

hm. I see. Thanks a lot for your efforts in the support :).

lockywolf · February 8, 2024, 8:03am

This issue is only present when nvidia.ko is loaded. So it is not obvious whether Intel is the culprit. Intel is just displaying more debugging info. Might very well be just detecting some mis-behaviour from nvidia.ko

generix · February 8, 2024, 8:48am

Maybe, but if the nvidia driver can take down the intel driver while the nvidia gpu is in offload mode and sleeping, this points to some issue with the intel driver, the nvidia driver rather being the trigger, not the reason. Since the OP had a specific kernel version when this started, the usual way to find the issue would be doing a kernel bisect.

georg9alem · February 8, 2024, 5:27pm

Still haven’t been able to do anything about this one. I’ve just noticed that freezes are more likely if I’m using the NVIDIA GPU…

generix · February 9, 2024, 9:09pm

Checked for a bios update meanwhile?

georg9alem · February 17, 2024, 1:30pm

@generix thanks for the quick response!

Yes, ofc. I actually updated it sometime after this problem started occurring.

I wonder if I should just keep trying different versions of the kernel & nvidia drivers until I find something stable.

Although your point actually was that this likely is an issue with the intel GPU.

georg9alem · February 24, 2024, 9:04am

Can confirm same problem both with kernel 6.2 & 6.5

georg9alem · March 6, 2024, 3:57pm

@generix

are there any other debugging steps you could recommend? I’m out of ideas on what to do.

Again, the error usually happens when I use the embedded GPU (so RTX 2070).

Could it be that the GPU is not compatible with the latest Nvidia Drivers? Currently I’m on: Driver Version: 535.161.07 CUDA Version: 12.2.

georg9alem · March 16, 2024, 7:22pm

upgrading to ubuntu 23.10 also didn’t help

lockywolf · September 25, 2024, 2:38am

What sort of debugging can we do to troubleshoot this issue?

It has been years since this issue has been reported, with no progress.

I can reproduce it 100% reliably.

The issue on the Intel/freedesktop issue tracker is still alive.

Topic		Replies	Views
Desktop computer won't turn on after an update when the GPU is connected Linux	7	402	May 17, 2024
Nvidia-driver-470 failing on Ubuntu 20.04.2 with 5.13.8-051308-generic kernel. What to do? Linux kernel , ubuntu , driver	6	22901	April 8, 2022
Can't boot after upgrading to 510 driver on Ubuntu 20.04 Linux boot , kernel , ubuntu , nvbugs	28	39676	July 15, 2022
Linux new kernel 6.5.0-14(ubuntu 22.04) can not compile NVIDIA display card driver Linux	45	24119	December 11, 2024
Linux kernel vs nvidia driver version Linux	11	13603	June 5, 2024
Ubuntu 20.04 with Kernel 5.13.0-30-generic doesn't recognize RTX 3080 ti Laptop GPU Linux kernel , ubuntu , nvidia-smi	9	6261	February 28, 2022
Boot related issue on Ubuntu 22.04 when NVIDIA driver is installed Linux	17	6350	April 19, 2024
New 455.32 driver install crashes ubuntu 20.04, cannot boot, recovery mode no longer accessible, 2080ti, kernel 5.4.0-56, AMD ryzen 9 Linux boot , cuda , kernel , ubuntu	2	3001	September 12, 2021
Ubuntu 22.04.1 Nvidia Driver (Open Kernel) Nvidia-Driver-515-Open Issue Linux kernel	14	22509	November 19, 2022
Ubuntu 18.04 and RTX 2080 SUPER systematically freezing Linux cuda , tensorflow , ubuntu	27	3741	October 12, 2021

GeForce RTX 2070 not working properly with ubuntu 22.04 on kernel 5.15

Related topics