Can't get nvidia 460 module to build on Ubuntu 20.04 to support two A100s

Hello all,

I have a node with two A100s that I can access only remotely, via ssh. The node boots an ubuntu 20.04 image via tftp. Nvidia 460 driver was installed as part of cuda, following these instructions. I blacklisted nouveau, rebuilt the initrd image and made sure tftp uses it. But the nvidia module does not get built even though I think I have everything I need.

root@node21:/# dpkg -l nvidia*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                             Version            Architecture Description
+++-================================-==================-============-=====================================================
un  nvidia-304                       <none>             <none>       (no description available)
un  nvidia-340                       <none>             <none>       (no description available)
un  nvidia-384                       <none>             <none>       (no description available)
un  nvidia-390                       <none>             <none>       (no description available)
un  nvidia-common                    <none>             <none>       (no description available)
ii  nvidia-compute-utils-460         460.27.04-0ubuntu1 amd64        NVIDIA compute utilities
ii  nvidia-dkms-460                  460.27.04-0ubuntu1 amd64        NVIDIA DKMS package
un  nvidia-dkms-kernel               <none>             <none>       (no description available)
ii  nvidia-driver-460                460.27.04-0ubuntu1 amd64        NVIDIA driver metapackage
un  nvidia-driver-binary             <none>             <none>       (no description available)
un  nvidia-kernel-common             <none>             <none>       (no description available)
ii  nvidia-kernel-common-460         460.27.04-0ubuntu1 amd64        Shared files used with the kernel module
un  nvidia-kernel-source             <none>             <none>       (no description available)
ii  nvidia-kernel-source-460         460.27.04-0ubuntu1 amd64        NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau-driver <none>             <none>       (no description available)
un  nvidia-legacy-340xx-vdpau-driver <none>             <none>       (no description available)
un  nvidia-libopencl1-dev            <none>             <none>       (no description available)
ii  nvidia-modprobe                  460.27.04-0ubuntu1 amd64        Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                <none>             <none>       (no description available)
un  nvidia-persistenced              <none>             <none>       (no description available)
ii  nvidia-prime                     0.8.14             all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                  460.27.04-0ubuntu1 amd64        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary           <none>             <none>       (no description available)
un  nvidia-smi                       <none>             <none>       (no description available)
un  nvidia-utils                     <none>             <none>       (no description available)
ii  nvidia-utils-460                 460.27.04-0ubuntu1 amd64        NVIDIA driver support binaries
un  nvidia-vdpau-driver              <none>             <none>       (no description available)

All build components are also installed:

root@node21:/# dpkg -l make build-essential linux-headers-5.4.0*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                           Version       Architecture Description
+++-==============================-=============-============-========================================================
ii  build-essential                12.8ubuntu1.1 amd64        Informational list of build-essential packages
ii  linux-headers-5.4.0-58         5.4.0-58.64   all          Header files related to Linux kernel version 5.4.0
ii  linux-headers-5.4.0-58-generic 5.4.0-58.64   amd64        Linux kernel headers for version 5.4.0 on 64 bit x86 SMP
ii  make                           4.2.1-1.2     amd64        utility for directing compilation

Yet when I try to modprobe nvidia, it says it’s not there; /dev/nvidia* files are also missing and nvidia-modprobe doesn’t do anything. If I try to reinstall nvidia-dkms-460, I get the following:

root@node21:/home/users/andrej# apt reinstall nvidia-dkms-460
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 29.5 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  nvidia-dkms-460 460.27.04-0ubuntu1 [29.5 kB]
Fetched 29.5 kB in 0s (221 kB/s)            
(Reading database ... 96625 files and directories currently installed.)
Preparing to unpack .../nvidia-dkms-460_460.27.04-0ubuntu1_amd64.deb ...
Removing all DKMS Modules
Done.
Unpacking nvidia-dkms-460 (460.27.04-0ubuntu1) over (460.27.04-0ubuntu1) ...
Setting up nvidia-dkms-460 (460.27.04-0ubuntu1) ...
update-initramfs: deferring update (trigger activated)

A modprobe blacklist file has been created at /etc/modprobe.d to prevent Nouveau
from loading. This can be reverted by deleting the following file:
/etc/modprobe.d/nvidia-graphics-drivers.conf

A new initrd image has also been created. To revert, please regenerate your
initrd by running the following command after deleting the modprobe.d file:
`/usr/sbin/initramfs -u`

*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can   ***
*** be loaded.                                                            ***
*****************************************************************************

INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
Loading new nvidia-460.27.04 DKMS files...
Building for 5.4.0-58-generic
Building for architecture x86_64
Module build for kernel 5.4.0-58-generic was skipped since the
kernel headers for this kernel does not seem to be installed.
Processing triggers for initramfs-tools (0.136ubuntu6.3) ...

Long story short, I tried everything I could think of and I still can’t get the module to build. I’m attaching the debug log for further information in case it’s helpful.

Thanks in advance for any and all insight and suggestions on what to try next!

nvidia-bug-report.log.gz (72.4 KB)

Please check if the build symlink to the headers for dkms exists:

ls /lib/modules/$(uname -r)/build

Otherwise, create it

ln -s /usr/src/linux-headers-$(uname -r)  /lib/modules/$(uname -r)/build
1 Like

That did the trick! Perfect! Thank you!!