Nvidia Runtime Components Not Detected in Docker Container on Jetson Orin NX (JetPack 6.2)
Issue:
When running a TensorRT test container, I get a CUDA driver error that prevents GPU functionality. I see this issue with other samples as well, and I’d like to know if I’m missing any steps or configurations.
Command and Error Output
Run Command:
sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-tensorrt:r10.3.0-devel
/usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx
Output:
==========
== CUDA ==
==========
CUDA Version 12.6.11
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
...
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
... (TensorRT initialization logs) ...
[02/25/2025-00:18:29] [I] === Device Information ===
Cuda failure: CUDA driver version is insufficient for CUDA runtime version
Environment Details
Jetson System Info:
sudo jetson_release -v
Software part of jetson-stats 4.3.1 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Jetson Orin NX Engineering Reference Developer Kit - Jetpack 6.2 [L4T 36.4.3]
NV Power Mode[3]: 25W
...
Platform:
- Machine: aarch64
- System: Linux
- Distribution: Ubuntu 22.04 Jammy Jellyfish
- Release: 5.15.148-tegra
...
Libraries:
- CUDA: 12.6.68
- cuDNN: 9.3.0.75
- TensorRT: 10.3.0.30
- VPI: 3.2.4
- Vulkan: 1.3.204
- OpenCV: 4.5.4 - with CUDA: NO
Docker Version:
sudo docker --version
Docker version 27.5.1, build 9f9e405
Docker Info:
sudo docker info
Client: Docker Engine - Community
Version: 27.5.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.20.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.32.4
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 6
Server Version: 27.5.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc version: v1.2.4-0-g6c52b3f
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.148-tegra
Operating System: Ubuntu 22.04.5 LTS
OSType: linux
Architecture: aarch64
CPUs: 8
Total Memory: 15.29GiB
Name: tegra
ID: 51e2f920-f0e6-444c-9d28-e68d18ad6e36
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker Daemon Configuration:
cat /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
NVIDIA Container Toolkit:
dpkg -l | grep nvidia-container-toolkit
ii nvidia-container-toolkit 1.16.2-1 arm64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.16.2-1 arm64 NVIDIA Container Toolkit Base
Summary of Steps Taken
-
Docker Downgrade:
Used the jetsonhacks/install-docker script to downgrade Docker to v27.5.1. -
NVIDIA Container Toolkit:
Version 1.16.2-1 is installed and configured with the runtime flag (--runtime=nvidia --gpus all
). -
TensorRT Container:
Running the container:nvcr.io/nvidia/l4t-tensorrt:r10.3.0-devel
. -
Error Encountered:
The container logs warn that the NVIDIA driver was not detected and TensorRT fails with a CUDA driver version error.
Questions & Request for Help
-
Driver Compatibility:
My host shows NVIDIA driver version 540.4.0 (vianvidia-smi
and/proc/driver/nvidia/version
), which should support CUDA 12.6. Is there a known compatibility issue with JetPack 6.2 that could cause the container to not detect the driver? -
Container Toolkit Configuration:
Are there additional configurations or troubleshooting steps to ensure that the NVIDIA Container Toolkit properly binds the host driver into the container? -
Additional Diagnostics:
What further logs or tests should I check to diagnose why the container cannot access the NVIDIA driver even though the host driver appears correct?
I want to run CUDA, TensorRT, VPI, etc., inside containers without GPU functionality issues. Any suggestions or additional steps I might be missing would be greatly appreciated.
Thanks in advance for your help!