Ubuntu 22.04 issue with nvidia drivers [3050 Ti mobile]

Hey, I have a thinkpad X1 Extreme gen 5 with an RTX 3050 Ti mobile and I have Ubuntu 22.04 installed.

I’ve been using the nvidia-driver-535 package and things were all fine until a recent apt upgrade. I think the ubuntu kernel changed or something. Then I ran into errors and had to do dpkg --configure -a. Ever since, I am having trouble with the nvidia driver.

Basically, the error that I get is that the dkms for nvidia could not be built.

I tried uninstalling and reinstalling the nvidia driver, different driver versions (525, 530, 535), I also tried a range of different kernels (including the kernel that used to work before the issue), but nothing seems to solve the problem. I’m currently using the 6.4.10-3-liquorix-amd64 kernel.

I am hence asking this here in case someone knows this issue or can help me solve it. Any ideas are welcome.

Thanks.

Please run nvidia-bug-report.sh as root and attach the resulting file here.

Sure, I have attached the bug report to this reply:

nvidia-bug-report.log.gz (930.6 KB)

Also to give you a bit more detail on the crash report:

When I run sudo apt install nvidia-driver-535 I get the following error:

Loading new nvidia-535.86.05 DKMS files...
Building for 6.4.10-3-liquorix-amd64
Building for architecture x86_64
Building initial module for 6.4.10-3-liquorix-amd64
ERROR (dkms apport): kernel package linux-headers-6.4.10-3-liquorix-amd64 is not supported
Error! Bad return status for module build on kernel: 6.4.10-3-liquorix-amd64 (x86_64)
Consult /var/lib/dkms/nvidia/535.86.05/build/make.log for more information.
dpkg: error processing package nvidia-dkms-535 (--configure):
 installed nvidia-dkms-535 package post-installation script subprocess returned error exit status 10
Setting up libnvidia-encode-535:amd64 (535.86.05-0ubuntu0.22.04.1) ...
Setting up libnvidia-encode-535:i386 (535.86.05-0ubuntu0.22.04.1) ...
dpkg: dependency problems prevent configuration of nvidia-driver-535:
 nvidia-driver-535 depends on nvidia-dkms-535 (<= 535.86.05-1); however:
  Package nvidia-dkms-535 is not configured yet.
 nvidia-driver-535 depends on nvidia-dkms-535 (>= 535.86.05); however:
  Package nvidia-dkms-535 is not configured yet.

dpkg: error processing package nvidia-driver-535 (--configure):
 dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          Processing triggers for bamfdaemon (0.5.6+22.04.20220217-0ubuntu1) ...
Rebuilding /usr/share/applications/bamf-2.index...
Processing triggers for desktop-file-utils (0.26-1ubuntu3) ...
Processing triggers for initramfs-tools (0.140ubuntu13.2) ...
update-initramfs: Generating /boot/initrd.img-6.4.10-3-liquorix-amd64
W: Possible missing firmware /lib/firmware/amd/amd_sev_fam19h_model1xh.sbin for module ccp
Processing triggers for gnome-menus (3.36.0-1ubuntu3) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
Errors were encountered while processing:
 nvidia-dkms-535
 nvidia-driver-535
E: Sub-process /usr/bin/dpkg returned an error code (1)

The error similarly happens with different kernels (not specific to the liqourix kernel).

I have also attached the nvidia-dkms-535.0.crash report generated by dpkg:

nvidia-dkms-535.0.crash (875.1 KB)

There seems to be something wrong with the build environment.

According to the dkms make.log:

make[1]: Entering directory ‘/usr/src/linux-headers-6.4.10-3-liquorix-amd64’

Which is the correct directory.
But then, where it fails:

scripts/mod/modpost -a -N -o /var/lib/dkms/nvidia/535.86.05/build/Module.symvers -T /var/lib/dkms/nvidia/535.86.05/build/modules.order -i Module.symvers -e -i /usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64/Module.symvers
/usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64/Module.symvers: No such file or directory
make[2]: *** [scripts/Makefile.modpost:136: /var/lib/dkms/nvidia/535.86.05/build/Module.symvers] Error 1

I wonder where this directory comes into play: /usr/src/ofa_kernel/x86_64/6.4.10-3-liquorix-amd64

/lib/modules/<KERNEL_VERSION>/build should point to /usr/src/<KERNEL_HEADER_VERSION>/ , where it should find the Module.symvers file.

1 Like

Thanks for this pointer, I’m not really sure what the ofa_kernel does, and why it’s there to begin with. Apparently it’s something developed by nvidia (Version 515.105.01(Linux)/518.03(Windows) :: NVIDIA Data Center GPU Driver Documentation). I’m not even sure how to uninstall it, and if it’s safe to do so without breaking the OS.

I might have installed the MLNX drivers at some point which could be the cause of the issue, but I’m not even sure how to uninstall the ofa_kernel.

Do you know what I should do about it?

Not really.
I don’t even know what that is :-o

A quick search brought up this:

Uninstall the MLNX_OFED driver.
ofed_uninstall.sh

You have that file somewhere?

Luckily it’s Linux and you can literally fix everything - if you know how…
I’d look for the installation instructions of what you installed and look to revert that.
2nd - Look for what files where installed and move them manually to a temporary location. To see if that fixes the issue and if it’ll break something else…

Good luck ;-p

Thanks again for the pointer.

For now, I moved the directory /usr/src/ofa_kernel to a different location (to back it up in case something goes wrong). After that, I was able to install the Nvidia driver, and so far everything seems to be working.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.