I’m setting up an Akash provider node on Ubuntu 24.04 with an NVIDIA RTX 3090 GPU. On the host machine, everything works fine:
nvidia-smi, Returns the expected GPU information (driver: 575.64.03
, CUDA version: 12.9
).
However, inside Docker containers (using --gpus all
and --runtime=nvidia
),
Container Behavior:
- When I run:
bash
CopyEdit
docker run --rm --gpus all --runtime=nvidia nvidia/cuda:12.9.0-base-ubuntu20.04 nvidia-smi
I get:
bash
CopyEdit
/bin/bash: nvidia-smi: command not found
When I run CUDA workload tests like:
bash
CopyEdit
docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
It works fine — GPU is utilized correctly.
What I’ve Verified:
- NVIDIA drivers (575.64.03) are correctly installed on the host.
nvidia-smi
works on the host without any issue.- Docker is using
nvidia-container-toolkit
and--gpus all
runtime. - I can run CUDA programs in containers, but
nvidia-smi
is missing inside containers. - I also tried official images like
nvidia/cuopt
,nvidia/cuda
, andnvcr.io/nvidia/k8s/cuda-sample
.
❓ What I Need Help With:
- Is it required for
nvidia-smi
to work inside containers for Akash GPU workloads to function properly? - Is there a known workaround to get
nvidia-smi
working inside containers (do I need to mount host binaries like/usr/bin/nvidia-smi
into the container)? - Can I safely ignore this if CUDA programs are running fine?