Ubuntu 22.04 clean install fails to detect RTX 4060Ti

I have a desktop system (bought February 2024) with the following:

  • Intel Core i7-12700KF 3600 1700
  • MSI PRO B760-P DDR4 II
  • I3D 16GB D6 RTX 4060 Ti Twin X2

It used to run fine, using Debian 12 and some Nvidia drivers (can’t see the version anymore I think it was during an upgrade from 525.147.05-7~deb12u1 to 535.183.01-1~deb12u1, the system got unresponsive after an "apt update & apt upgrade, and I’ve been going downhill since :( and I ultimately declared my Debian installation broken beyond repair and decided to do a clean install of something else.

Tried several installs of Ubuntu 24.04 and Ubuntu 22.04 (also Lubuntu and Kubuntu), but none were successful.
The attempts with 24.04 resulted in a system that does not even show a login, and only sporadically allows me to switch to a TTY (Ctrl-Alt-F2 etc. do not respond, and if I get to the tty there is a 3 to 5 second delay for each keypress; once logged in to the tty, gnome process is hogging all of the cpu). Scouring the internet for solutions has not resolved the issues yet.

With 22.04 and 24.04 I tried different drivers (535 and 535-open and 545 from ubuntu-drivers, the .run downloaded from Nvidia with the -m=kernel_open), and tried different kernels (6.5, 6.8, 5.15) (but I am not experienced with that, so may not have tried in the correct ways; ran into gcc issues with version differences 11 or 12 being the default).

To confirm there are no hardware issues, I installed Windows 11 and there Blender works fine, and recognises the GPU with CUDA and OptiX acceleration.

Where can I find clear instructions on how to set up Debian/Ubuntu on my system with GPU acceleration (I am using Blender and need either the OptiX or CUDA or both support)?

What I did most recently:

  • download the Ubuntu ISO for 22.04
  • create USB boot stick (using balenaEtcher on a Mac)
  • run the installer, chosing the option for additional drivers
  • for 22.04 this results in a system where the Nvidia driver is not loaded: nvidia-smi returns ‘No devices were found’

I am attaching the nvidia-bug-report that is generated on my latest clean install of Ubuntu 22.04.
nvidia-bug-report.log.gz (117.7 KB)

Before trying again the apt remove --purge "*nvidia*" && apt install linux-headers-generic && apt install nvidia-dkms-NNN && apt install nvidia-driver-NNN I hope someone can point me to instructions that have worked for my setup.

Looking forward to tips or solutions (it’s been a month of trial-and-error).

Additional info:
lspci reports:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2803 (rev a1)

and lspci -v -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2803 (rev a1) (prog-if 00 [VGA controller])
Subsystem: InnoVISION Multimedia Ltd. Device 1903
Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 10
Memory at 80000000 (32-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=16G]
Memory at 4400000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at 81000000 [virtual] [disabled] [size=512K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Now I ran sudo update-pciids, and then the lspci -v -s 01:00.0 command returns:

01:00.0 VGA compatible controller: NVIDIA Corporation AD106 [GeForce RTX 4060 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: InnoVISION Multimedia Ltd. Device 1903
Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 10
Memory at 80000000 (32-bit, non-prefetchable) [size=16M]
Memory at 4000000000 (64-bit, prefetchable) [size=16G]
Memory at 4400000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at 81000000 [virtual] [disabled] [size=512K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Reboot (because of updates that were run), nvidia driver still fails. From the kern.log:

Aug 31 09:10:40 borg kernel: [    4.000739] nvidia: loading out-of-tree module taints kernel.
Aug 31 09:10:40 borg kernel: [    4.000751] nvidia: module license 'NVIDIA' taints kernel.
Aug 31 09:10:40 borg kernel: [    4.000752] Disabling lock debugging due to kernel taint
Aug 31 09:10:40 borg kernel: [    4.000755] nvidia: module license taints kernel.
Aug 31 09:10:40 borg kernel: [    4.051123] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
Aug 31 09:10:40 borg kernel: [    4.051127] 
Aug 31 09:10:40 borg kernel: [    4.051955] nvidia 0000:01:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
Aug 31 09:10:40 borg kernel: [    4.083649] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
Aug 31 09:10:40 borg kernel: [    4.095716] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.183.01  Sun May 12 19:39:15 UTC 2024
Aug 31 09:10:40 borg kernel: [    4.107401] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.183.01  Sun May 12 19:31:08 UTC 2024
Aug 31 09:10:40 borg kernel: [    4.113065] intel_tcc_cooling: Programmable TCC Offset detected
Aug 31 09:10:40 borg kernel: [    4.119063] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Aug 31 09:10:40 borg kernel: [    4.123261] intel_rapl_msr: PL4 support detected.
Aug 31 09:10:40 borg kernel: [    4.123299] intel_rapl_common: Found RAPL domain package
Aug 31 09:10:40 borg kernel: [    4.123302] intel_rapl_common: Found RAPL domain core
Aug 31 09:10:41 borg kernel: [    4.150842] loop8: detected capacity change from 0 to 8
Aug 31 09:10:41 borg kernel: [    4.167867] scsi 8:0:0:0: Direct-Access     Generic  Mass-Storage     1.11 PQ: 0 ANSI: 2
Aug 31 09:10:41 borg kernel: [    4.167975] sd 8:0:0:0: Attached scsi generic sg2 type 0
Aug 31 09:10:41 borg kernel: [    4.176704] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-300:00: attached PHY driver (mii_bus:phy_addr=r8169-0-300:00, irq=MAC)
Aug 31 09:10:41 borg kernel: [    4.342831] r8169 0000:03:00.0 enp3s0: Link is Down
Aug 31 09:10:41 borg kernel: [    4.432318] sd 8:0:0:0: [sdb] Media removed, stopped polling
Aug 31 09:10:41 borg kernel: [    4.432558] sd 8:0:0:0: [sdb] Attached SCSI removable disk
Aug 31 09:10:41 borg kernel: [    4.593407] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x40:1468)
Aug 31 09:10:41 borg kernel: [    4.593519] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Aug 31 09:10:41 borg kernel: [    4.593960] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
Aug 31 09:10:41 borg kernel: [    4.594595] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
Aug 31 09:10:41 borg kernel: [    4.767604] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Aug 31 09:10:41 borg kernel: [    4.786651] nvidia-uvm: Loaded the UVM driver, major device number 511.

Generated a new bugreport:
nvidia-bug-report.log.gz (124.3 KB)

UEFI/BIOS settings:


Yesterday I did a bios upgrade (from 1.40 to 1.70 - E7E29IMS.170), which activated secure boot again. Disabled secure boot, restarting the system did not solve any issues.

Then tried downgrading to kernel 5.15 (don’t do this, it eventually locked up my system during startup, regardless of the grub choice I made; booting in 5.15 or 6.8 failed, even in recovery mode; I did a fresh install again after the following attempts):

  • apt install linux-image-generic linux-headers-generic
  • apt remove --autoremove linux-image-generic-hwe-22.04
  • ubuntu-drivers install nvidia:535-open
  • this started to ask for MOK key setup (in hindsight this was caused by the BIOS upgrade that activated secure boot without me noticing it)
  • drivers install failed
  • apt remove --purge “nvidia*”
  • ubuntu-drivers install nvidia:535-open
  • reboot
  • MOK key install dialog popped up (maybe I should have rebooted earlier, to complete this?)
  • boot 5.15 kernel → screen freezes up (also in recovery mode)
  • boot 6.8 kernel → screen freezes up (also in recovery mode)
  • deactivate secure boot in UEFI
  • boot in any kernel with ant without recovery mode: screen freezes up, no options to switch to a tty - crtl-Fn nor Ctrl-Alt-Fn work)
  • RIP the system, perform a new install from USB.

After clean install, no nvidia drivers active (of course, been there :) ).

  • update-pciids
  • nvidia-smi reports “No devices were found”
  • kern.log shows same output as posted in my original report
  • nvidia-bug-report.sh output attached here:

nvidia-bug-report.log.gz (118.3 KB)

Some progress.
Starting point: clean install of Ubuntu 22.04 - uses kernel 6.8.0-40-generic (uname -r).

$ uname -a
Linux borg 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • sudo add-apt-repository ppa:graphics-drivers
  • sudo apt update
  • check which drivers are now available: ubuntu-drivers devices
    • for me lists 550, 555, and 560 drivers in several options (open and proprietary) among others
  • switch to console (Ctrl-Meta-Alt-F3), login, sudo -i
  • stop X: systemctl stop display-manager
  • purge all nvidia drivers: apt remove --purge "^nvidia*"
  • ubuntu-drivers install nvidia:555-open
  • reboot
  • after login, open terminal and run nvidia-smi, output:
$ nvidia-smi
Mon Sep  2 10:58:48 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     Off |   00000000:01:00.0  On |                  N/A |
|  0%   34C    P8              5W /  165W |     321MiB /  16380MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1823      G   /usr/lib/xorg/Xorg                            183MiB |
|    0   N/A  N/A      2001      G   /usr/bin/gnome-shell                          122MiB |
+-----------------------------------------------------------------------------------------+

This is promising.

The post that got me on the right track: How to Install Nvidia Drivers on Ubuntu 24.04, 22.04, or 20.04 - LinuxCapable (Method 3)

Edit: I can confirm that Blender recognises the card as a CUDA and OptiX capable render target. All’s good.