Creating Containers Using nvidia-docker with AGX Xavier

I’m having issues creating nvidia-docker with AGX Xavier whenever I try creating a container

nvidia@x02:~$ nvidia-docker run --rm nvidia/cuda nvcc --version
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused “process_linux.go:449: container init caused “process_linux.go:432: running prestart hook 1 caused \“error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\”””: unknown.

The AGX Xavier I am using has the following specs:

  • Jetpack version 4.3 (ARM64)
  • L4T R32.3.1
  • CUDA version 10.0.326
  • Ubuntu 18.04.3 LTS (Bionic Beaver)
  • Docker version 19.03.12
  • NVIDIA Docker version 2.0.3
  • nvidia-container-toolkit (= 1.0.1-1)
  • nvidia-docker2 (= 2.2.0-1)
  • nvidia-container-runtime (= 3.1.0-1)

I’ve looked up similar errors, but haven’t found a solution yet

Hoping someone can help me with this one.

Thanks,

Paul

Looks like youre trying to use nvidia/cuda, which is x86_64 (amd64), where as the Jetson is arm64. There is nvidia/cuda-arm64, but if I remember correctly, thats built using CUDA Toolkit 11.0, and when trying to create a container with it, itll throw an error saying that the Toolkit 10.2 that the Jetson has isnt compatible.

There are a few “l4t” containers on the nvidia container catalog such as l4t-base that’ll probably work for you.

EDIT: Just noticed youre on Jetpack 4.3, CUDA 10.0. The latest l4t-base might be on 10.2, so you might need to pull the ‘r32.3.1’ tag, your mileage may vary.

Hi,

As em202020 mentioned, please make sure you are using the r32.3.1 image.
https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base/tags

Please noticed that there are some dependency between device OS and docker image.
This will require you to use the same L4T OS version to make sure the compatibility.

Thanks.

Thanks for responding @em202020 and @AastaLLL!

Thank you for pointing out that I’m using an incompatible image. I did try to download the L4T-base image with the R32.3.1 tag, but I’m still getting the same error. Please see below:

nvidia@x02:~$ nvidia-docker run --rm nvcr.io/nvidia/l4t-base:r32.3.1 nvcc --version
Unable to find image 'nvcr.io/nvidia/l4t-base:r32.3.1' locally
r32.3.1: Pulling from nvidia/l4t-base
8aaa03d29a6e: Pull complete
e73d3a974854: Pull complete
2c14cdba18f5: Pull complete
23dd63c7659b: Pull complete
3bd414bd9504: Pull complete
cafd526eb263: Pull complete
483b0873e636: Pull complete
2568c5428ff2: Pull complete
6bcd9356d42f: Pull complete
c7f6d0180a4e: Pull complete
beddc9b83fb0: Pull complete
656f2307c79e: Pull complete
fe2e73a571b7: Pull complete
f5decba41c07: Pull complete
f0b6e413c48c: Pull complete
Digest: sha256:e8987d52ddb9496948e02656fc62d46561abce25bfe83203f4bc24c67e094578
Status: Downloaded newer image for nvcr.io/nvidia/l4t-base:r32.3.1
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.

I’m still puzzled as to what’s causing the error. Any idea on what might be causing it?

Not sure if this is going to give you a clue of what’s going on, but thought it might help.

nvidia@x02:~$ nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0810 16:31:13.700287 4534 nvc.c:282] initializing library context (version=1.2.0, build=d22237acaea94aa5ad5de70aac903534ed598819)
I0810 16:31:13.700480 4534 nvc.c:256] using root /
I0810 16:31:13.700542 4534 nvc.c:257] using ldcache /etc/ld.so.cache
I0810 16:31:13.700567 4534 nvc.c:258] using unprivileged user 1000:1000
I0810 16:31:13.700852 4534 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0810 16:31:13.701280 4534 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0810 16:31:13.702186 4534 nvc.c:172] failed to detect NVIDIA devices
W0810 16:31:13.703118 4535 nvc.c:187] failed to set inheritable capabilities
W0810 16:31:13.703366 4535 nvc.c:188] skipping kernel modules load due to failure
I0810 16:31:13.704372 4536 driver.c:101] starting driver service
E0810 16:31:13.705560 4536 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0810 16:31:13.705991 4534 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request

It seems that L4T-base only has Linux/AMD64 variant for the R32.3.1 version based on this link (https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base/tags). Probably that’s why it doesn’t work on Xavier because it has an ARM64 architecture. The latest one, R32.4.3, has the Linux/ARM64 variant, but I’ll have to reflash my Xavier and lose all the files currently installed.

I also tried L4T TensorFlow, but it doesn’t have the R32.3.1 version. (https://ngc.nvidia.com/catalog/containers/nvidia:l4t-tensorflow)

Do you know if there is an available version of L4T TensorFlow for R32.3.1 that is archived? Maybe that will work.

Solved this issue by backing up the files currently installed on the Xavier and reflashing it with JetPack 4.4 which has the L4T R32.4.3 required for installing the nvidia docker images I need.

nvidia@x02:~$ nvidia-docker run --rm nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Thanks again to @em202020 and @AastaLLL for leading me to the right direction.