DRIVE OS and support for the Nvidia Container Toolkit

Please provide the following info (check/uncheck the boxes after creating this topic):
Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.6.1.8175
1.6.0.8170
other

Host Machine Version
native Ubuntu 18.04
other

Is it possible to run docker containers on DRIVE AGX (DRIVE OS 5.1.x or 5.2.x) with support of GPUs (i…e using the --gpus all argument)?

Background: Installing the nvidia-container-toolkit package (by following the instructions here: Installation Guide — NVIDIA Cloud Native Technologies documentation) on a freshly flashed DRIVE AGX (with DRIVE OS 5.2.0) results in an error when starting docker containers with GPU support.

Steps followed on a freshly flashed DRIVE AGX:

  1. Install nvidia-container-toolkit as follows:
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt install -y nvidia-container-toolkit

  2. Run hello-world with GPU support
    docker run -it --rm --init --gpus all hello-world
    Error received:
    docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

  3. Check the GPU info using nvidia-container-cli
    sudo nvidia-container-cli -k -d /dev/tty info
    Output:
    – WARNING, the following logs are for debugging purposes only –
    I1013 08:29:57.291185 487 nvc.c:372] initializing library context (version=1.5.1, build=4afad130c4c253abd3b2db563ffe9331594bda41)
    I1013 08:29:57.291332 487 nvc.c:346] using root /
    I1013 08:29:57.291365 487 nvc.c:347] using ldcache /etc/ld.so.cache
    I1013 08:29:57.291518 487 nvc.c:348] using unprivileged user 65534:65534
    I1013 08:29:57.291623 487 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
    I1013 08:29:57.291861 487 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
    W1013 08:29:57.293471 487 nvc.c:254] failed to detect NVIDIA devices
    I1013 08:29:57.293785 488 nvc.c:274] loading kernel module nvidia
    E1013 08:29:57.295927 488 nvc.c:276] could not load kernel module nvidia
    I1013 08:29:57.295961 488 nvc.c:292] loading kernel module nvidia_uvm
    E1013 08:29:57.297779 488 nvc.c:294] could not load kernel module nvidia_uvm
    I1013 08:29:57.297812 488 nvc.c:301] loading kernel module nvidia_modeset
    E1013 08:29:57.299604 488 nvc.c:303] could not load kernel module nvidia_modeset
    I1013 08:29:57.300266 489 driver.c:101] starting driver service
    E1013 08:29:57.300793 489 driver.c:168] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
    I1013 08:29:57.301088 487 driver.c:203] driver service terminated successfully
    nvidia-container-cli: initialization error: driver error: failed to process request

Since the library libnvidia-ml.so.1 doesn’t exist in the DRIVE OS installation, I’m wondering if the nvidia-container-toolkit is even supported on the DRIVE OS

Dear @mohammed.hashem,
May I know why you are looking for docker setup on target?

We have a demonstration project using some of our already existing containerized applications. One of those applications requires GPU capability.
Previously we worked on the Nvidia Jetson Xavier AGX, which supports and has its own version of nvidia-container-toolkit for L4T, and successfully ran our demo.
Now we are tasked with moving the same demo to DRIVE OS and faced the issue described in my previous post

Dear @mohammed.hashem,
Docker is not supported on DRIVE platform.

Hi, @mohammed.hashem

Also, below is information about supported docker (from DRIVE OS 5.2.6 Release Notes (PDF)).

image

Well not to be pedantic, but actually Docker runs perfectly well on the DRIVE OS. Even the known “docker exec” issue (see here: Docker exec fails on xavier) is only because the init scripts on DRIVE OS use chroot (instead of pivot_root like in the L4T init script case) and that is an easy “nsenter” fix to avoid that the whole system is running in a chroot environment to begin with.
Anyway I guess the answer is that Nvidia doesn’t provide the necessary packages for DRIVE OS to allow containers to fully benefit from the underlying hardware capabilities like for the Jetson product line.

Yes, you’re right.
When we said it’s not supported, it means the release doesn’t focus on target container support and doesn’t have any testing coverage.