Please provide the following info (check/uncheck the boxes after creating this topic): Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other
Is it possible to run docker containers on DRIVE AGX (DRIVE OS 5.1.x or 5.2.x) with support of GPUs (i…e using the --gpus all argument)?
Background: Installing the nvidia-container-toolkit package (by following the instructions here: Installation Guide — NVIDIA Cloud Native Technologies documentation) on a freshly flashed DRIVE AGX (with DRIVE OS 5.2.0) results in an error when starting docker containers with GPU support.
Run hello-world with GPU support
docker run -it --rm --init --gpus all hello-world
Error received:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Check the GPU info using nvidia-container-cli
sudo nvidia-container-cli -k -d /dev/tty info
Output:
– WARNING, the following logs are for debugging purposes only –
I1013 08:29:57.291185 487 nvc.c:372] initializing library context (version=1.5.1, build=4afad130c4c253abd3b2db563ffe9331594bda41)
I1013 08:29:57.291332 487 nvc.c:346] using root /
I1013 08:29:57.291365 487 nvc.c:347] using ldcache /etc/ld.so.cache
I1013 08:29:57.291518 487 nvc.c:348] using unprivileged user 65534:65534
I1013 08:29:57.291623 487 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1013 08:29:57.291861 487 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W1013 08:29:57.293471 487 nvc.c:254] failed to detect NVIDIA devices
I1013 08:29:57.293785 488 nvc.c:274] loading kernel module nvidia
E1013 08:29:57.295927 488 nvc.c:276] could not load kernel module nvidia
I1013 08:29:57.295961 488 nvc.c:292] loading kernel module nvidia_uvm
E1013 08:29:57.297779 488 nvc.c:294] could not load kernel module nvidia_uvm
I1013 08:29:57.297812 488 nvc.c:301] loading kernel module nvidia_modeset
E1013 08:29:57.299604 488 nvc.c:303] could not load kernel module nvidia_modeset
I1013 08:29:57.300266 489 driver.c:101] starting driver service
E1013 08:29:57.300793 489 driver.c:168] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I1013 08:29:57.301088 487 driver.c:203] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
Since the library libnvidia-ml.so.1 doesn’t exist in the DRIVE OS installation, I’m wondering if the nvidia-container-toolkit is even supported on the DRIVE OS
We have a demonstration project using some of our already existing containerized applications. One of those applications requires GPU capability.
Previously we worked on the Nvidia Jetson Xavier AGX, which supports and has its own version of nvidia-container-toolkit for L4T, and successfully ran our demo.
Now we are tasked with moving the same demo to DRIVE OS and faced the issue described in my previous post
Well not to be pedantic, but actually Docker runs perfectly well on the DRIVE OS. Even the known “docker exec” issue (see here: Docker exec fails on xavier) is only because the init scripts on DRIVE OS use chroot (instead of pivot_root like in the L4T init script case) and that is an easy “nsenter” fix to avoid that the whole system is running in a chroot environment to begin with.
Anyway I guess the answer is that Nvidia doesn’t provide the necessary packages for DRIVE OS to allow containers to fully benefit from the underlying hardware capabilities like for the Jetson product line.
Yes, you’re right.
When we said it’s not supported, it means the release doesn’t focus on target container support and doesn’t have any testing coverage.