Ubuntu 22.04 issue with nvidia drivers [3050 Ti mobile]

Hey, I have a thinkpad X1 Extreme gen 5 with an RTX 3050 Ti mobile and I have Ubuntu 22.04 installed.

I’ve been using the nvidia-driver-535 package and things were all fine until a recent apt upgrade. I think the ubuntu kernel changed or something. Then I ran into errors and had to do dpkg --configure -a. Ever since, I am having trouble with the nvidia driver.

Basically, the error that I get is that the dkms for nvidia could not be built.

I tried uninstalling and reinstalling the nvidia driver, different driver versions (525, 530, 535), I also tried a range of different kernels (including the kernel that used to work before the issue), but nothing seems to solve the problem. I’m currently using the 6.4.10-3-liquorix-amd64 kernel.

I am hence asking this here in case someone knows this issue or can help me solve it. Any ideas are welcome.

Thanks.

Please run nvidia-bug-report.sh as root and attach the resulting file here.

Sure, I have attached the bug report to this reply:

nvidia-bug-report.log.gz (930.6 KB)

Also to give you a bit more detail on the crash report:

When I run sudo apt install nvidia-driver-535 I get the following error:

Loading new nvidia-535.86.05 DKMS files...
Building for 6.4.10-3-liquorix-amd64
Building for architecture x86_64
Building initial module for 6.4.10-3-liquorix-amd64
ERROR (dkms apport): kernel package linux-headers-6.4.10-3-liquorix-amd64 is not supported
Error! Bad return status for module build on kernel: 6.4.10-3-liquorix-amd64 (x86_64)
Consult /var/lib/dkms/nvidia/535.86.05/build/make.log for more information.
dpkg: error processing package nvidia-dkms-535 (--configure):
 installed nvidia-dkms-535 package post-installation script subprocess returned error exit status 10
Setting up libnvidia-encode-535:amd64 (535.86.05-0ubuntu0.22.04.1) ...
Setting up libnvidia-encode-535:i386 (535.86.05-0ubuntu0.22.04.1) ...
dpkg: dependency problems prevent configuration of nvidia-driver-535:
 nvidia-driver-535 depends on nvidia-dkms-535 (<= 535.86.05-1); however:
  Package nvidia-dkms-535 is not configured yet.
 nvidia-driver-535 depends on nvidia-dkms-535 (>= 535.86.05); however:
  Package nvidia-dkms-535 is not configured yet.

dpkg: error processing package nvidia-driver-535 (--configure):
 dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          Processing triggers for bamfdaemon (0.5.6+22.04.20220217-0ubuntu1) ...
Rebuilding /usr/share/applications/bamf-2.index...
Processing triggers for desktop-file-utils (0.26-1ubuntu3) ...
Processing triggers for initramfs-tools (0.140ubuntu13.2) ...
update-initramfs: Generating /boot/initrd.img-6.4.10-3-liquorix-amd64
W: Possible missing firmware /lib/firmware/amd/amd_sev_fam19h_model1xh.sbin for module ccp
Processing triggers for gnome-menus (3.36.0-1ubuntu3) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
Errors were encountered while processing:
 nvidia-dkms-535
 nvidia-driver-535
E: Sub-process /usr/bin/dpkg returned an error code (1)

The error similarly happens with different kernels (not specific to the liqourix kernel).

I have also attached the nvidia-dkms-535.0.crash report generated by dpkg:

nvidia-dkms-535.0.crash (875.1 KB)

There seems to be something wrong with the build environment.

According to the dkms make.log:

make[1]: Entering directory ‘/usr/src/linux-headers-6.4.10-3-liquorix-amd64’

Which is the correct directory.
But then, where it fails:

scripts/mod/modpost -a -N -o /var/lib/dkms/nvidia/535.86.05/build/Module.symvers -T /var/lib/dkms/nvidia/535.86.05/build/modules.order -i Module.symvers -e -i /usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64/Module.symvers
/usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64/Module.symvers: No such file or directory
make[2]: *** [scripts/Makefile.modpost:136: /var/lib/dkms/nvidia/535.86.05/build/Module.symvers] Error 1

I wonder where this directory comes into play: /usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64

/lib/modules/<KERNEL_VERSION>/build should point to /usr/src/<KERNEL_HEADER_VERSION>/ , where it should find the Module.symvers file.

1 Like

Thanks for this pointer, I’m not really sure what the ofa_kernel does, and why it’s there to begin with. Apparently it’s something developed by nvidia (Version 515.105.01(Linux)/518.03(Windows) :: NVIDIA Data Center GPU Driver Documentation). I’m not even sure how to uninstall it, and if it’s safe to do so without breaking the OS.

I might have installed the MLNX drivers at some point which could be the cause of the issue, but I’m not even sure how to uninstall the ofa_kernel.

Do you know what I should do about it?

Not really.
I don’t even know what that is :-o

A quick search brought up this:

Uninstall the MLNX_OFED driver.
ofed_uninstall.sh

You have that file somewhere?

Luckily it’s Linux and you can literally fix everything - if you know how…
I’d look for the installation instructions of what you installed and look to revert that.
2nd - Look for what files where installed and move them manually to a temporary location. To see if that fixes the issue and if it’ll break something else…

Good luck ;-p

Thanks again for the pointer.

For now, I moved the directory /usr/src/ofa_kernel to a different location (to back it up in case something goes wrong). After that, I was able to install the Nvidia driver, and so far everything seems to be working.

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.