First time installing CUDA drivers for a systemd-nspawn container: host running rhel8, container running ubuntu 20.04. nvidia-smi is able to run on host, but not able to run in container:
ubuntu-box:~$ nvidia-smi
-bash: nvidia-smi: command not found
Here’s what I’ve done:
Installed the CUDA driver on the host:
[host]$ nvidia-smi
Wed Sep 21 17:36:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000001:00:00.0 Off | Off |
| N/A 32C P0 35W / 250W | 0MiB / 16384MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Bind the nvidia devices as:
[Exec]
Boot=yes
PrivateUsers=no
Hostname=ubuntu-box
Capability=all
SystemCallFilter=add_key keyctl
[Network]
VirtualEthernet=no
[Files]
Bind=/sys/fs/cgroup
Bind=/dev/dri
Bind=/dev/nvidia0
Bind=/dev/nvidiactl
Bind=/dev/nvidia-modeset
Bind=/dev/nvidia-uvm
Bind=/dev/nvidia-uvm-tools
#Bind=/dev/nvidia-caps
Bind=/dev/input
Bind=/dev/shm
#Bind=/dev/input/js0
Edit device allow as:
[Service]
DeviceAllow=/dev/nvidiactl
DeviceAllow=/dev/nvidia0
DeviceAllow=/dev/nvidia-modeset
And installed the ‘same-version’ CUDA tool kit inside the container:
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run
unselected the driver while installing the runfile, the installation looks successful.
Can anyone help to take a look which part i’ve missed? Do i need to install driver in container as well??
Many thanks!