Container hosting on AGX Xavier with JetPack 5.0

znmeb · April 10, 2022, 12:15am

I have an AGX Xavier development kit that I’m planning to upgrade to JetPack 5.0. If I do, will I be able to host containers that are built on a JetPack 4.6.1 base? Will the CUDA tools mounted into the container work?

never_released · April 10, 2022, 12:39am

Quick answer: for an early out of the box attempt, no.

$ docker run --gpus all --runtime nvidia --network host --rm -it nvcr.io/nvidia/l4t-ml:r32.7.1-py3

and then from Jupyter:

# python3
Python 3.6.9 (default, Dec  8 2021, 21:08:43) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-04-10 00:37:34.001877: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda-10.2/targets/aarch64-linux/lib:
2022-04-10 00:37:34.001983: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-04-10 00:37:34.002319: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.10.2'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda-10.2/targets/aarch64-linux/lib:
2022-04-10 00:37:34.002372: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Segmentation fault (core dumped)

I’ll try to see if I can fix that by manually installing CUDA 10.2 to the container…

znmeb · April 10, 2022, 12:44am

That’s what I figured. Looks like some of my projects go on hold until the chip shortage is over. :-(

never_released · April 10, 2022, 12:46am

I’m currently trying to see if copying over those SOs will work. In theory it should be because the cudart <-> CUDA ABI is forward compatible.

Will keep you aware and see if it works…

never_released · April 10, 2022, 1:25am

I think that I have it working properly now for both PyTorch and TensorFlow.

(Dockerfile below)

FROM nvcr.io/nvidia/l4t-ml:r32.7.1-py3

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/c/cuda-cudart/cuda-cudart-10-2_10.2.300-1_arm64.deb && dpkg-deb -x cuda-cudart-10-2_10.2.300-1_arm64.deb cudart && rm cuda-cudart-10-2_10.2.300-1_arm64.deb && cp -r cudart/usr/local/cuda-10.2/targets/aarch64-linux/lib/* /usr/local/cuda-10.2/targets/aarch64-linux/lib && rm -rf cudart

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/libc/libcurand/libcurand-10-2_10.1.2.300-1_arm64.deb && dpkg-deb -x libcurand-10-2_10.1.2.300-1_arm64.deb curand && rm libcurand-10-2_10.1.2.300-1_arm64.deb && cp -r curand/usr/local/cuda-10.2/targets/aarch64-linux/lib/* /usr/local/cuda-10.2/targets/aarch64-linux/lib && rm -rf curand

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/libc/libcufft/libcufft-10-2_10.1.2.300-1_arm64.deb && dpkg-deb -x libcufft-10-2_10.1.2.300-1_arm64.deb cufft && rm libcufft-10-2_10.1.2.300-1_arm64.deb && cp -r cufft/usr/local/cuda-10.2/targets/aarch64-linux/lib/* /usr/local/cuda-10.2/targets/aarch64-linux/lib && rm -rf cufft

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/libc/libcublas/libcublas10_10.2.3.300-1_arm64.deb && dpkg-deb -x libcublas10_10.2.3.300-1_arm64.deb cublas && rm libcublas10_10.2.3.300-1_arm64.deb && cp -r cublas/usr/local/cuda-10.2/targets/aarch64-linux/lib/* /usr/local/cuda-10.2/targets/aarch64-linux/lib && rm -rf cublas

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/c/cudnn/libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb && dpkg-deb -x libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb cudnn && rm libcudnn8_8.2.1.32-1+cuda10.2_arm64.deb && cp -rf cudnn/* / && rm -rf cudnn

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/c/cuda-nvtx/cuda-nvtx-10-2_10.2.300-1_arm64.deb && dpkg-deb -x cuda-nvtx-10-2_10.2.300-1_arm64.deb nvtx && rm cuda-nvtx-10-2_10.2.300-1_arm64.deb && cp -rf nvtx/* / && rm -rf nvtx

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/libc/libcusparse/libcusparse-10-2_10.3.1.300-1_arm64.deb && dpkg-deb -x libcusparse-10-2_10.3.1.300-1_arm64.deb cusparse && rm libcusparse-10-2_10.3.1.300-1_arm64.deb && cp -rf cusparse/* / && rm -rf cusparse

RUN wget https://repo.download.nvidia.com/jetson/common/pool/main/libc/libcusolver/libcusolver-10-2_10.3.0.300-1_arm64.deb && dpkg-deb -x libcusolver-10-2_10.3.0.300-1_arm64.deb cusolver && rm libcusolver-10-2_10.3.0.300-1_arm64.deb && cp -r cusolver/* / && rm -rf cusolver

With the benefit of hindsight, I can simplify that one a bit more now, but this one is simple enough to work reliably.

With the new L4T the CUDA runtime side libraries are no longer mounted in (and not like they’re the same version anyway), so we need to manually fetch those.

Thankfully they stopped mounting those even for the current release in JetPack 5.0, so won’t be an issue going forward.

znmeb · April 10, 2022, 1:42am

Looks promising for my personal projects, but it’s not something I’d foist upon an open source community.

never_released · April 10, 2022, 1:52am

In hindsight, NVIDIA made the mistake of mounting those higher-level libraries to the guest when those libraries aren’t stable between CUDA releases. It was always going to break down at some point.

There’s essentially no fix that doesn’t involve shipping a CUDA 10.2 install on the new JetPack 5.0… or manually installing those libraries to the containers.

For the former option, this means installing CUDA 10.2 from JetPack 4.x and then using -v to mount that as a volume to /usr/local/cuda-10.2 within the container, which could either be documented or done automatically in that scenario.

For the latter one, this needs light container changes to fetch those libraries, and it’s what I’ve done in this case.

system · April 24, 2022, 1:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transferring Dockers Between Xavier Devices Jetson AGX Xavier docker	4	437	November 10, 2021
Cuda library is not found in jetson-containers docker Jetson Xavier NX cuda , docker	8	2213	February 1, 2023
Disable mount plugins Jetson AGX Xavier cuda , docker , cudnn	5	1191	October 18, 2021
Jetson Xavier AGX - Unable to initialize GPU with tensorflow CUDA Developer Tools	0	570	July 29, 2020
How to install CUDA, CUDNN and TensorRT on Jetpack 4.6 Jetson AGX Xavier Jetson AGX Xavier cuda , jetson	4	3844	November 23, 2022
Cuda install on L4T with JetPack 5.x Jetson AGX Xavier cuda	5	1039	October 9, 2023
How can I access cuda from a docker container other than Ubuntu 18? Jetson Xavier NX cuda	2	442	August 29, 2021
Docker for the jetson family with cuda, cuddn, gstreamer, python and opencv Jetson Xavier NX docker	7	2678	October 18, 2021
Missing cuda.csv, cudnn.csv, tensorrt.csv in /etc/nvidia-container-runtime/host-files-for-container.d/ Docker and NVIDIA Docker tensorrt , cuda , docker	8	2390	February 3, 2023
Libcurand.so.10 not found on JetPack 4.6.2 in docker Jetson AGX Xavier cuda	13	2072	July 6, 2022

Container hosting on AGX Xavier with JetPack 5.0

Related topics