I’m having issues creating nvidia-docker with AGX Xavier whenever I try creating a container
nvidia@x02:~$ nvidia-docker run --rm nvidia/cuda nvcc --version
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused “process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request \\n\""”: unknown.
The AGX Xavier I am using has the following specs:
Jetpack version 4.3 (ARM64)
L4T R32.3.1
CUDA version 10.0.326
Ubuntu 18.04.3 LTS (Bionic Beaver)
Docker version 19.03.12
NVIDIA Docker version 2.0.3
nvidia-container-toolkit (= 1.0.1-1)
nvidia-docker2 (= 2.2.0-1)
nvidia-container-runtime (= 3.1.0-1)
I’ve looked up similar errors, but haven’t found a solution yet
opened 06:36AM - 26 Feb 20 UTC
I have config the docker 19.03.6 and nvidia-docker successfully.BUT ,when I test… :
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
GET errors :
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
then, I check the nvidia-container-cli ,it seems no error
sudo nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0226 06:26:25.224982 78809 nvc.c:281] initializing library context (version=1.0.2, build=ff40da533db929bf515aca59ba4c701a65a35e6b)
I0226 06:26:25.225050 78809 nvc.c:255] using root /
I0226 06:26:25.225061 78809 nvc.c:256] using ldcache /etc/ld.so.cache
I0226 06:26:25.225071 78809 nvc.c:257] using unprivileged user 65534:65534
I0226 06:26:25.230611 78810 nvc.c:191] loading kernel module nvidia
I0226 06:26:25.230931 78810 nvc.c:203] loading kernel module nvidia_uvm
I0226 06:26:25.231053 78810 nvc.c:211] loading kernel module nvidia_modeset
I0226 06:26:25.231436 78811 driver.c:133] starting driver service
I0226 06:26:25.356687 78809 nvc_info.c:434] requesting driver information with ''
I0226 06:26:25.356983 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.418.87.00
I0226 06:26:25.357280 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.87.00
I0226 06:26:25.357333 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.87.00
I0226 06:26:25.357441 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.87.00
I0226 06:26:25.357512 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.87.00
I0226 06:26:25.357559 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.87.00
I0226 06:26:25.357629 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.87.00
I0226 06:26:25.357711 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.87.00
I0226 06:26:25.357760 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.87.00
I0226 06:26:25.357806 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.87.00
I0226 06:26:25.357868 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.00
I0226 06:26:25.357928 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.87.00
I0226 06:26:25.358002 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.87.00
I0226 06:26:25.358053 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.87.00
I0226 06:26:25.358108 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.87.00
I0226 06:26:25.358179 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.87.00
I0226 06:26:25.358606 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.418.87.00
I0226 06:26:25.358847 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.87.00
I0226 06:26:25.358902 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.87.00
I0226 06:26:25.358951 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.87.00
I0226 06:26:25.359001 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.87.00
W0226 06:26:25.359039 78809 nvc_info.c:303] missing compat32 library libnvidia-ml.so
W0226 06:26:25.359047 78809 nvc_info.c:303] missing compat32 library libnvidia-cfg.so
W0226 06:26:25.359056 78809 nvc_info.c:303] missing compat32 library libcuda.so
W0226 06:26:25.359066 78809 nvc_info.c:303] missing compat32 library libnvidia-opencl.so
W0226 06:26:25.359076 78809 nvc_info.c:303] missing compat32 library libnvidia-ptxjitcompiler.so
W0226 06:26:25.359086 78809 nvc_info.c:303] missing compat32 library libnvidia-fatbinaryloader.so
W0226 06:26:25.359097 78809 nvc_info.c:303] missing compat32 library libnvidia-compiler.so
W0226 06:26:25.359107 78809 nvc_info.c:303] missing compat32 library libvdpau_nvidia.so
W0226 06:26:25.359117 78809 nvc_info.c:303] missing compat32 library libnvidia-encode.so
W0226 06:26:25.359128 78809 nvc_info.c:303] missing compat32 library libnvidia-opticalflow.so
W0226 06:26:25.359138 78809 nvc_info.c:303] missing compat32 library libnvcuvid.so
W0226 06:26:25.359149 78809 nvc_info.c:303] missing compat32 library libnvidia-eglcore.so
W0226 06:26:25.359159 78809 nvc_info.c:303] missing compat32 library libnvidia-glcore.so
W0226 06:26:25.359169 78809 nvc_info.c:303] missing compat32 library libnvidia-tls.so
W0226 06:26:25.359177 78809 nvc_info.c:303] missing compat32 library libnvidia-glsi.so
W0226 06:26:25.359186 78809 nvc_info.c:303] missing compat32 library libnvidia-fbc.so
W0226 06:26:25.359194 78809 nvc_info.c:303] missing compat32 library libnvidia-ifr.so
W0226 06:26:25.359203 78809 nvc_info.c:303] missing compat32 library libGLX_nvidia.so
W0226 06:26:25.359212 78809 nvc_info.c:303] missing compat32 library libEGL_nvidia.so
W0226 06:26:25.359220 78809 nvc_info.c:303] missing compat32 library libGLESv2_nvidia.so
W0226 06:26:25.359253 78809 nvc_info.c:303] missing compat32 library libGLESv1_CM_nvidia.so
I0226 06:26:25.359527 78809 nvc_info.c:229] selecting /usr/bin/nvidia-smi
I0226 06:26:25.359560 78809 nvc_info.c:229] selecting /usr/bin/nvidia-debugdump
I0226 06:26:25.359585 78809 nvc_info.c:229] selecting /usr/bin/nvidia-persistenced
I0226 06:26:25.359608 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-control
I0226 06:26:25.359632 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-server
I0226 06:26:25.359667 78809 nvc_info.c:366] listing device /dev/nvidiactl
I0226 06:26:25.359676 78809 nvc_info.c:366] listing device /dev/nvidia-uvm
I0226 06:26:25.359687 78809 nvc_info.c:366] listing device /dev/nvidia-uvm-tools
I0226 06:26:25.359697 78809 nvc_info.c:366] listing device /dev/nvidia-modeset
W0226 06:26:25.359731 78809 nvc_info.c:274] missing ipc /var/run/nvidia-persistenced/socket
W0226 06:26:25.359753 78809 nvc_info.c:274] missing ipc /tmp/nvidia-mps
I0226 06:26:25.359763 78809 nvc_info.c:490] requesting device information with ''
I0226 06:26:25.366457 78809 nvc_info.c:520] listing device /dev/nvidia0 (GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 at 00000000:05:00.0)
I0226 06:26:25.373129 78809 nvc_info.c:520] listing device /dev/nvidia1 (GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 at 00000000:06:00.0)
I0226 06:26:25.380167 78809 nvc_info.c:520] listing device /dev/nvidia2 (GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 at 00000000:85:00.0)
I0226 06:26:25.387215 78809 nvc_info.c:520] listing device /dev/nvidia3 (GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb at 00000000:86:00.0)
NVRM version: 418.87.00
CUDA version: 10.1
Device Index: 0
Device Minor: 0
Model: Tesla K80
Brand: Tesla
GPU UUID: GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883
Bus Location: 00000000:05:00.0
Architecture: 3.7
Device Index: 1
Device Minor: 1
Model: Tesla K80
Brand: Tesla
GPU UUID: GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4
Bus Location: 00000000:06:00.0
Architecture: 3.7
Device Index: 2
Device Minor: 2
Model: Tesla K80
Brand: Tesla
GPU UUID: GPU-0687efee-81a2-537e-d7fe-3a5694aceb29
Bus Location: 00000000:85:00.0
Architecture: 3.7
Device Index: 3
Device Minor: 3
Model: Tesla K80
Brand: Tesla
GPU UUID: GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb
Bus Location: 00000000:86:00.0
Architecture: 3.7
I0226 06:26:25.387330 78809 nvc.c:318] shutting down library context
I0226 06:26:25.388428 78811 driver.c:192] terminating driver service
I0226 06:26:25.440777 78809 driver.c:233] driver service terminated successfully
is the nvidia-driver-version too low? in fact,the 418.87.00 is the nvidia official network recommend, and how to update the driver by apt instead of mannually with the driver-run file?
I do not konw how to make it works. can anyone help me?
Hoping someone can help me with this one.
Thanks,
Paul
Looks like youre trying to use nvidia/cuda , which is x86_64 (amd64), where as the Jetson is arm64. There is nvidia/cuda-arm64 , but if I remember correctly, thats built using CUDA Toolkit 11.0, and when trying to create a container with it, itll throw an error saying that the Toolkit 10.2 that the Jetson has isnt compatible.
There are a few “l4t” containers on the nvidia container catalog such as l4t-base that’ll probably work for you.
EDIT: Just noticed youre on Jetpack 4.3, CUDA 10.0. The latest l4t-base might be on 10.2, so you might need to pull the ‘r32.3.1’ tag, your mileage may vary.
Hi,
As em202020 mentioned, please make sure you are using the r32.3.1 image.
Please noticed that there are some dependency between device OS and docker image.
This will require you to use the same L4T OS version to make sure the compatibility.
Thanks.
Thanks for responding @em202020 and @AastaLLL !
Thank you for pointing out that I’m using an incompatible image. I did try to download the L4T-base image with the R32.3.1 tag, but I’m still getting the same error. Please see below:
nvidia@x02:~$ nvidia-docker run --rm nvcr.io/nvidia/l4t-base:r32.3.1 nvcc --version
Unable to find image 'nvcr.io/nvidia/l4t-base:r32.3.1' locally
r32.3.1: Pulling from nvidia/l4t-base
8aaa03d29a6e: Pull complete
e73d3a974854: Pull complete
2c14cdba18f5: Pull complete
23dd63c7659b: Pull complete
3bd414bd9504: Pull complete
cafd526eb263: Pull complete
483b0873e636: Pull complete
2568c5428ff2: Pull complete
6bcd9356d42f: Pull complete
c7f6d0180a4e: Pull complete
beddc9b83fb0: Pull complete
656f2307c79e: Pull complete
fe2e73a571b7: Pull complete
f5decba41c07: Pull complete
f0b6e413c48c: Pull complete
Digest: sha256:e8987d52ddb9496948e02656fc62d46561abce25bfe83203f4bc24c67e094578
Status: Downloaded newer image for nvcr.io/nvidia/l4t-base:r32.3.1
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
I’m still puzzled as to what’s causing the error. Any idea on what might be causing it?
Not sure if this is going to give you a clue of what’s going on, but thought it might help.
nvidia@x02:~$ nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0810 16:31:13.700287 4534 nvc.c:282] initializing library context (version=1.2.0, build=d22237acaea94aa5ad5de70aac903534ed598819)
I0810 16:31:13.700480 4534 nvc.c:256] using root /
I0810 16:31:13.700542 4534 nvc.c:257] using ldcache /etc/ld.so.cache
I0810 16:31:13.700567 4534 nvc.c:258] using unprivileged user 1000:1000
I0810 16:31:13.700852 4534 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0810 16:31:13.701280 4534 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0810 16:31:13.702186 4534 nvc.c:172] failed to detect NVIDIA devices
W0810 16:31:13.703118 4535 nvc.c:187] failed to set inheritable capabilities
W0810 16:31:13.703366 4535 nvc.c:188] skipping kernel modules load due to failure
I0810 16:31:13.704372 4536 driver.c:101] starting driver service
E0810 16:31:13.705560 4536 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0810 16:31:13.705991 4534 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
It seems that L4T-base only has Linux/AMD64 variant for the R32.3.1 version based on this link (NVIDIA L4T Base | NVIDIA NGC ). Probably that’s why it doesn’t work on Xavier because it has an ARM64 architecture. The latest one, R32.4.3, has the Linux/ARM64 variant, but I’ll have to reflash my Xavier and lose all the files currently installed.
I also tried L4T TensorFlow, but it doesn’t have the R32.3.1 version. (NVIDIA L4T TensorFlow | NVIDIA NGC )
Do you know if there is an available version of L4T TensorFlow for R32.3.1 that is archived? Maybe that will work.
Solved this issue by backing up the files currently installed on the Xavier and reflashing it with JetPack 4.4 which has the L4T R32.4.3 required for installing the nvidia docker images I need.
nvidia@x02:~$ nvidia-docker run --rm nvcr.io/nvidia/l4t-tensorflow:r32.4.3-tf1.15-py3 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
Thanks again to @em202020 and @AastaLLL for leading me to the right direction.
Hi @AastaLLL Aastalll:
we have a problem for this docker as below, the Nvidia Docker allways mention cant found the driver ,please help check ,thanks
kayccc
March 4, 2021, 5:14am
10
Hi rico.deng ,
Please help to open a new topic if it’s still an issue.
Thanks