I’m setting up an Akash provider node on Ubuntu 24.04 with an NVIDIA RTX 3090 GPU. On the host machine, everything works fine:
nvidia-smi, Returns the expected GPU information (driver: 575.64.03, CUDA version: 12.9).
However, inside Docker containers (using --gpus all and --runtime=nvidia),
Container Behavior:
- When I run:
bash
CopyEdit
docker run --rm --gpus all --runtime=nvidia nvidia/cuda:12.9.0-base-ubuntu20.04 nvidia-smi
I get:
bash
CopyEdit
/bin/bash: nvidia-smi: command not found
When I run CUDA workload tests like:
bash
CopyEdit
docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
It works fine — GPU is utilized correctly.
What I’ve Verified:
- NVIDIA drivers (575.64.03) are correctly installed on the host.
nvidia-smiworks on the host without any issue.- Docker is using
nvidia-container-toolkitand--gpus allruntime. - I can run CUDA programs in containers, but
nvidia-smiis missing inside containers. - I also tried official images like
nvidia/cuopt,nvidia/cuda, andnvcr.io/nvidia/k8s/cuda-sample.
❓ What I Need Help With:
- Is it required for
nvidia-smito work inside containers for Akash GPU workloads to function properly? - Is there a known workaround to get
nvidia-smiworking inside containers (do I need to mount host binaries like/usr/bin/nvidia-smiinto the container)? - Can I safely ignore this if CUDA programs are running fine?