As the title says, I’m trying to create a docker image with both deepstream and pytorch but are currently failing.
My system setup: Jetson AGX with a clean jetpack 5.1.
My first try was to merge two images as a multi-stage docker file:
FROM nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.7-py3
FROM nvcr.io/nvidia/deepstream-l4t:5.1-21.02-sample
But this did not work. I guess it’s because the first image uses jp5.0 and the second 5.1
I then tried to use the deepstream docker container as my starting point and then install pytorch.
FROM nvcr.io/nvidia/deepstream-l4t:5.1-21.02-samples
RUN pip3 install Cython
RUN pip3 install numpy
RUN mkdir torch_install
RUN wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch_install/torch-1.8.0-cp36-cp36m-linux_aarch64.whl
RUN apt-get install python3-pip libopenblas-base libopenmpi-dev -y
RUN cd torch_install && pip3 install torch-1.8.0-cp36-cp36m-linux_aarch64.whl && cd …
RUN apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev -y
RUN git clone --branch v0.9.0 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision /opt/nvidia/deepstream/deepstream-5.1/sources/torchvision
RUN pip3 install PyYAML tqdm
RUN pip3 install requests
RUN pip3 install onnx pycuda
RUN apt-get install libopenblas-dev -y
RUN export BUILD_VERSION=0.9.0 && \
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib &&
python3 setup.py install
But this gives the error:
Step 40/40 : RUN export BUILD_VERSION=0.9.0 && export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib && python3 setup.py install
—> Running in a29f9103cbee
Traceback (most recent call last):
File “setup.py”, line 12, in
import torch
File “/usr/local/lib/python3.6/dist-packages/torch/init.py”, line 195, in
_load_global_deps()
File “/usr/local/lib/python3.6/dist-packages/torch/init.py”, line 148, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/usr/lib/python3.6/ctypes/init.py”, line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
The command ‘/bin/sh -c export BUILD_VERSION=0.9.0 && export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib && python3 setup.py install’ returned a non-zero code: 1
I then tried to just outcomment the line “python3 setup.py install” for the torchvision installation, then start the container and run it manually.
This succeeds! it’s possible to install torchvision.
I would like to understand why the command ffails in the docker-file but succeeds when I run the docker-container.
My guess is that I have access to cuda devices while running the docker but not during the build of the docker.
How do I change my dockerfile so it can install torchvision?