Description
Hi Team,
I want to do DL/transformer model training and inference on my laptop
So I am trying to install nvidia-driver, cuda toolkit and cudnn in my ubuntu 22.04 (dual boot).
So while installing nvidia driver i am facing issues
Below are the steps i have taken to install it.
- Initially i tried to install nvidia driver from the command “ubuntu-drivers autoinstall”. This caused me black screen issue while booting up the ubuntu, so i uninstalled it on recovery mode. I used sudo apt-get remove --purge ‘^nvidia-.*’ to uninstall nvidia
- While installing cuda toolkit from CUDA Toolkit 12.5 Downloads | NVIDIA Developer , i install nvidia-driver as below
sudo apt-get install -y cuda-drivers
and it didn’t seems to work so i uninstalled nvidia-driver - Then i downloaded the run file from official source (Linux x64 (AMD64/EM64T) Display Driver | 550.90.07 | Linux 64-bit | NVIDIA) and installed run file. With that also nvidia-smi didn’t work
- I disabled the secure boot after nvidia driver installation but after laptop was not booting up. I am not sure what was the issue behind this
- Later i registered the public key generated on step 3 using mokutil --import “PUBLIC KEY PATH” and reboot after that and added that key in MOK managemnet blue screen. After this step also nvidia-smi command didn’t work
- Then i created public and private key using openssl before installing the nvidia driver and passed those keys while installing but ended up with below error from logs
- SSL error:FFFFFFFF80000002:system library::No such file or directory: …/crypto/bio/bss_file.c:67
- SSL error:10000080:BIO routines::no such file: …/crypto/bio/bss_file.c:75
sign-file: Nvidia.key
→ Failed to sign kernel module.
I am able to see public key registered in mockutil --list-enrolled command
Environment
GPU Type: geforce rtx 4090
Nvidia Driver Version: 550.90.07
CUDA Version: 12.5
CUDNN Version: didn’t install
Operating System + Version: ubuntu 22.04
Relevant Files
I am attaching the nvidia bug report
nvidia-bug-report.log (889.8 KB)
Log file while installing at step 6
nvidia-installer_withcustomkey.log (38.0 KB)
output of some helpfull commands i found on similar issues discussion
uname -r
6.5.0-41-generic
modprobe nvidia
modprobe: ERROR: could not insert ‘nvidia’: Key was rejected by service
dkms status
nvidia/550.90.07, 6.5.0-41-generic, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
lsmod | grep -E “nouveau|nvidia”
nvidia_wmi_ec_backlight 12288 0
video 73728 4 nvidia_wmi_ec_backlight,dell_wmi,dell_laptop,i915
wmi 40960 8 dell_wmi_sysman,video,nvidia_wmi_ec_backlight,dell_wmi_ddv,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor
dpkg -l | grep nvidia
I tried to uninstall other nvidia other versions as in above screenshot but it is not getting uninstalled. Commands tried as below
sudo apt-get remove --purge '^nvidia-.*'
sudo apt autoremove
Disabled nouveau as below
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf; sudo update-initramfs -u
Any Help is appreciated @generix @MarkusHoHo
Best Regards