DKMS Error nvidia-driver-465, Ubunut LTS 20.04

So I am using Pytroch and so on requiring CUDA 11.3, therefore to make everythin run successfully I wanted to install the nvidia graphic card driver ‘nvidia-driver-465’

It is also listed in the list of available drivers and I can also select it in the UI to update my drivers. However I always encounter the error below. I already tried removing everything nvidia related, and clean installed but I can not make it work. The same occurs for nvidia-driver-455 which would be the other option using Pytorch LTS with Cuda 11.1,

I can install every other driver version from nvidia enabling cuda 11.2, 11.4 or 11.6 (last one is the latest and recommended)

I am using Ubuntu LTS 20.04 freshly installed.
gcc version 9.4
5.13.0-35-generic, x86_64: installed

------------------------------
Deleting module version: 465.19.01
completely from the DKMS tree.
------------------------------
Done.
Loading new nvidia-465.19.01 DKMS files...
Building for 5.13.0-35-generic
Building for architecture x86_64
Building initial module for 5.13.0-35-generic

Error! Bad return status for module build on kernel: 5.13.0-35-generic (x86_64)
Consult /var/lib/dkms/nvidia/465.19.01/build/make.log for more information.
dpkg: error processing package nvidia-dkms-465 (--configure):
 installed nvidia-dkms-465 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-driver-465:
 nvidia-driver-465 depends on nvidia-dkms-465 (= 465.19.01-0ubuntu1); however:
  Package nvidia-dkms-465 is not configured yet.

dpkg: error processing package nvidia-driver-465 (--configure):
 dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
                          Processing triggers for initramfs-tools (0.136ubuntu6.7) ...
update-initramfs: Generating /boot/initrd.img-5.13.0-35-generic
Errors were encountered while processing:
 nvidia-dkms-465
 nvidia-driver-465
E: Sub-process /usr/bin/dpkg returned an error code (1)

No need to install a specific driver for a cuda version. You can just use driver v510 and run any cuda-toolkit version.

1 Like

Ok thank you. But if I do this, i receive this using PyTorch, conda

RuntimeError: CUDA error: no kernel image is available for execution on the device

System Info:

echo $CUDA_HOME
/usr/local/cuda-11.3
torch.__version__
Out[6]: '1.11.0+cu113'
 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

Accroding to Stackoverflow etc. it says then Pytroch and Cuda version have a mismatch but that not the case the only mismatch is

nvidia-smi
Thu Mar 17 13:01:40 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |

any ideas or suggestions?

The error message means that the application you’re trying to run does not have a cuda kernel compiled for the gpu you are using (gpu too new). No way around this but to use a newer pytorch version that supports your gpu’s cc/sm level.

1 Like

I use the newest Pytroch verison

pytroch 1.11, with cudatoolkit 11.3

which gpu model?

1 Like

GeForce RTX 3090

That should be well compatible. What’s the output of
torch.cuda_version

1 Like

maybe I try doing everythin from scratch, maybe a dependency was broken at some point in the beginning?

cause the config seems to be fine

torch.version.cuda
Out[4]: '11.3'

in general thanks for your fast respond! thats really cool!

I guess so. Cuda kernels for sm_86 should be shipped since pytorch 1.8 so maybe some dependency was only built against cuda 10 and is now blocking the rest.
How did you install pytorch and cuda-toolkit?

1 Like

yes I did. I did everythin from scratch turned out the repo i used had some precompiled dependencies compiled with cuda 10, recompiled everything after a clean reinstall works now