Installing nvidia driver in chroot

Hello,

I am wondering if it is possible to install an nvidia driver in chroot. I am creating a Debian server image that will be used in a diskless boot. I want to be able to execute nvidia-smi on the system performing the diskless boot to query data such as temperature.

Running nvidia-detect on the system performing the diskless boot tells me that I should install the nvidia-driver package. However, when trying to install this package I get the following errors.

dpkg: error processing package nvidia-kernel-dkms (–configure):
installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of nvidia-driver:
nvidia-driver depends on nvidia-kernel-dkms (= 418.152.00-1) | nvidia-kernel-418.152.00; however:
Package nvidia-kernel-dkms is not configured yet.
Package nvidia-kernel-418.152.00 is not installed.
Package nvidia-kernel-dkms which provides nvidia-kernel-418.152.00 is not configured yet.

dpkg: error processing package nvidia-driver (–configure):
dependency problems - leaving unconfigured
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.38.1+dfsg-1) …
Processing triggers for libc-bin (2.28-10) …
Processing triggers for initramfs-tools (0.133+deb10u1) …
update-initramfs: Generating /boot/initrd.img-4.19.0-13-amd64
live-boot: core filesystems devices utils udev blockdev dns.
Processing triggers for update-glx (1.0.0) …
Processing triggers for glx-alternative-nvidia (1.0.0) …
update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in auto mode
Processing triggers for glx-alternative-mesa (1.0.0) …
Processing triggers for systemd (241-7~deb10u5) …
Processing triggers for libc-bin (2.28-10) …
Processing triggers for initramfs-tools (0.133+deb10u1) …
update-initramfs: Generating /boot/initrd.img-4.19.0-13-amd64
live-boot: core filesystems devices utils udev blockdev dns.
Errors were encountered while processing:
nvidia-kernel-dkms
nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)

I read here that without the card installed it is not possible to install drivers, Install nvidia drivers without gpu - Graphics Cards - Linus Tech Tips.

Is there some way to install the driver or at least the appropriate packages without the card installed while making the image? The card would obviously be in the system performing the diskless boot. Not on the system that I am creating the image in.

Any help or advice with this would be greatly appreciated. I’m very new to installing nvidia drivers in general and to the forum so it’s entirely possible I may be doing something clearly wrong.

Hi! The error here seems to be when nvidia-driver-dkms is running a post install task, most likely from what I know, this is the point when it is trying to compile the driver for the currently installed kernel/kernels.
This doesn’t show the error it is giving specifically though so it is hard to tell exactly when it is failing.
dkms is a system to rebuild the drivers every time there is a kernel change, the first thing to do may be to find out how to get access to that full log.

Do you have a kernel image and headers installed in the chroot?

Now that I think about it…I did not install nvidia-kernel-dkms before trying to install the driver. Can’t believe I didn’t do that after seeing the error message. I guess that could explain why it says “Errors were encountered while processing:
nvidia-kernel-dkms.” Let me try installing that tomorrow morning before trying to install the driver. The system is in our lab and I don’t have remote access to it at the moment unfortunately. But I’ll report back and appreciate the feedback.

Thank you for the feedback. And I didn’t know that about dkms. Still very new to this and learning as I go, As I mentioned below I just realized that I never actually installed nvidia-kernel-dkms which was a huge mistake on my end and I can’t believe I forgot to do that. Tomorrow morning when I have access to the system I’ll try installing nvidia-kernel-dkms then run the installer and report back.

What installer are you running?
If you are using the nvidia installer I am not sure you want to install that package as it is part of a packaged set of drivers usually.
Debian provides pretty up to date drivers it seems:
https://wiki.debian.org/NvidiaGraphicsDrivers#bullseye-460
Is there a reason you cannot just install those packages into your changeroot? It is usually preferred over using the nvidia installer.

I had installed nvidia-detect and ran it to try and figure out what drivers to install. What I got back was, “Your card is supported by the default drivers and legacy driver series 390. It is recommended to install the nvidia-driver” Which is what I was trying to install by just running an “apt install nvidia-driver.”

I can definitely try that as well. I do know the card I’m using is an older Quadro card. So I’m wondering if I need to follow the legacy GPUs section on the page you provided. Which looks like what I was attempting to do with the exception of forgetting to obtain the proper kernel headers. So I’ll definitely need to do that as well. Will report back tomorrow after I try a couple of more things in the lab.

Update on a few things I tried this morning. Per the documentation on the Debian page here I installed the kernel headers for the nvidia driver to build with. I then added “contrib” and “non-free” to sources.list per the documentation and then ran “apt update” followed by “apt install nvidia-driver firmware-misc-nonfree.” This provided me with the following error messages:

grep: /proc/cpuinfo: No such file or directory
It is likely that 4.19.0-11-amd64 belongs to a chroot’s host
Building for 4.19.0-14-amd64
/usr/sbin/dkms: line 2051: /dev/fd/62: No such file or directory
/usr/sbin/dkms: line 1982: /dev/fd/62: No such file or directory
dpkg: error processing package nvidia-kernel-dkms (–configure):
installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of nvidia-driver:
nvidia-driver depends on nvidia-kernel-dkms (= 418.181.07-1) | nvidia-kernel-418.181.07; however:
Package nvidia-kernel-dkms is not configured yet.
Package nvidia-kernel-418.181.07 is not installed.
Package nvidia-kernel-dkms which provides nvidia-kernel-418.181.07 is not configured yet.

dpkg: error processing package nvidia-driver (–configure):
dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.133+deb10u1) …
update-initramfs: Generating /boot/initrd.img-4.19.0-14-amd64
live-boot: core filesystems devices utils udev blockdev dns.
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.38.1+dfsg-1) …
Processing triggers for libc-bin (2.28-10) …
Processing triggers for update-glx (1.0.0) …
Processing triggers for glx-alternative-nvidia (1.0.0) …
update-alternatives: using /usr/lib/nvidia to provide /usr/lib/glx (glx) in auto mode
Processing triggers for glx-alternative-mesa (1.0.0) …
Processing triggers for systemd (241-7~deb10u6) …
Processing triggers for libc-bin (2.28-10) …
Processing triggers for initramfs-tools (0.133+deb10u1) …
update-initramfs: Generating /boot/initrd.img-4.19.0-14-amd64
live-boot: core filesystems devices utils udev blockdev dns.
Errors were encountered while processing:
nvidia-kernel-dkms
nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)

Unsure of what else to try I tried to install nvidia-kernel-dkms. Which gave me the following:
Done.
Loading new nvidia-current-418.181.07 DKMS files…
grep: /proc/cpuinfo: No such file or directory
It is likely that 4.19.0-11-amd64 belongs to a chroot’s host
Building for 4.19.0-14-amd64
/usr/sbin/dkms: line 2051: /dev/fd/62: No such file or directory
/usr/sbin/dkms: line 1982: /dev/fd/62: No such file or directory
dpkg: error processing package nvidia-kernel-dkms (–configure):
installed nvidia-kernel-dkms package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of nvidia-driver:
nvidia-driver depends on nvidia-kernel-dkms (= 418.181.07-1) | nvidia-kernel-418.181.07; however:
Package nvidia-kernel-dkms is not configured yet.
Package nvidia-kernel-418.181.07 is not installed.
Package nvidia-kernel-dkms which provides nvidia-kernel-418.181.07 is not configured yet.

dpkg: error processing package nvidia-driver (–configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:
nvidia-kernel-dkms
nvidia-driver
E: Sub-process /usr/bin/dpkg returned an error code (1)

Could this be caused by the fact that I’m doing this inside of chroot and some directories are currently missing until the image is loaded after PXE booting? For example, I see that it is complaining about “grep: /proc/cpuinfo: No such file or directory.” Which I know does not have anything inside of it until the image is actually loaded in memory on the system performing the diskless boot.

I also see the following “/usr/sbin/dkms: line 2051: /dev/fd/62: No such file or directory.” I see the fd directory inside of /dev. But I can not navigate inside of the fd directory. I believe the reason for that is because /dev/fd appears to have a symbolic link to /proc/self/fd. With that being said, could my issue be caused by the fact that there is nothing inside of /proc inside of chroot? Do I need to manaully create these directories? Thanks again in advance for any help or advice you can provide.

I don’t understand this at all. What’s your full definition of a Diskless Boot?

and why can’t you check temperatures with something other that nvidia-smi.

And When are you checking these vitals. During a network clone or install? I don’t get it.

1 of us doesn’t know how dkms works.

Why wouldn’t you just install the Nvidia-dkmsless driver ??? Headless or non-headless. ?? Don’t bother converting over to the module kernel. It’s a nightmare waste of time. Change keyrings and scrub every kernel conf. Just get the dkmsless iso. Then install the nvidia kernel module. I had to install 3 cascading nvidia dkmsless modules on a kernel 5.10.10 base. To get it going. It benches way better than dkms. So much better. I tried sharing the find but copped whack flack about GPL and Lawyers. Ubuntu made it available. I just fell across it. I’m really tired hopefully were talking about the same thing. Let me know if I’m way off topic

Do you have the /proc and /dev mounts in the chroot?
https://wiki.debian.org/chroot

So I believe I got the driver to install by following the below steps in the following order:

1.)apt install linux-headers-amd64
2.)added “contrib” and “non-free” to /etc/apt/sources.list
3.)apt update
4.)mount proc /proc -t proc
5.)apt install nvidia-driver
6.)umount proc /proc -t proc

Then proceeded with running mksquashfs and creating my image. The node will now PXE boot and load the image and appears to have the driver installed as I can navigate to /proc/driver/nvidia/ which I definitely wasn’t able to do before.

Of course I have another issue now. When I try to run nvidia-smi I am now getting “command not found.” I saw a recommendation online to make sure I have secure boot disabled in the BIOS. Which I do. So I obviously did something wrong.

sudo apt install nvidia-smi

It’s definitely me that doesn’t known how dkms works. I have zero experience installing nvidia drivers up until now and have very limited experience with Linux as well.

As I said below to Mart I believe I got the driver to install by following the below steps in the following order:

1.)apt install linux-headers-amd64
2.)added “contrib” and “non-free” to /etc/apt/sources.list
3.)apt update
4.)mount proc /proc -t proc
5.)apt install nvidia-driver
6.)umount proc /proc -t proc

Then proceeded with running mksquashfs and creating my image. The node will now PXE boot and load the image and appears to have the driver installed as I can navigate to /proc/driver/nvidia/ which I definitely wasn’t able to do before.

Of course I have another issue now. When I try to run nvidia-smi I am now getting “command not found.” I saw a recommendation online to make sure I have secure boot disabled in the BIOS. Which I do. So I obviously did something wrong.

I haven’t heard of the Nvidia-dkmsless driver you’re referring to. Which isn’t surprising as I didn’t know anything about any of this a week ago. I’ll try to see if I can find it online. Right now I’m just wondering why nvidia-smi isn’t running. As this is was the main reason I wanted to install the nvidia driver in the first place.

Thank you. Can’t believe I forgot that. Everything seems to be working as it should now. Thank you all for the help and suggestions.

Just ignore that murphy guy… glad you got it working.