Docker image for deepstream and pytorch

As the title says, I’m trying to create a docker image with both deepstream and pytorch but are currently failing.

My system setup: Jetson AGX with a clean jetpack 5.1.
My first try was to merge two images as a multi-stage docker file:
FROM nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.7-py3
FROM nvcr.io/nvidia/deepstream-l4t:5.1-21.02-sample

But this did not work. I guess it’s because the first image uses jp5.0 and the second 5.1

I then tried to use the deepstream docker container as my starting point and then install pytorch.

FROM nvcr.io/nvidia/deepstream-l4t:5.1-21.02-samples
RUN pip3 install Cython
RUN pip3 install numpy

RUN mkdir torch_install
RUN wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch_install/torch-1.8.0-cp36-cp36m-linux_aarch64.whl
RUN apt-get install python3-pip libopenblas-base libopenmpi-dev -y
RUN cd torch_install && pip3 install torch-1.8.0-cp36-cp36m-linux_aarch64.whl && cd …

RUN apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev -y
RUN git clone --branch v0.9.0 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision /opt/nvidia/deepstream/deepstream-5.1/sources/torchvision
RUN pip3 install PyYAML tqdm
RUN pip3 install requests
RUN pip3 install onnx pycuda
RUN apt-get install libopenblas-dev -y

RUN export BUILD_VERSION=0.9.0 && \
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib &&
python3 setup.py install

But this gives the error:

Step 40/40 : RUN export BUILD_VERSION=0.9.0 && export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib && python3 setup.py install
—> Running in a29f9103cbee
Traceback (most recent call last):
File “setup.py”, line 12, in
import torch
File “/usr/local/lib/python3.6/dist-packages/torch/init.py”, line 195, in
_load_global_deps()
File “/usr/local/lib/python3.6/dist-packages/torch/init.py”, line 148, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/usr/lib/python3.6/ctypes/init.py”, line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
The command ‘/bin/sh -c export BUILD_VERSION=0.9.0 && export LD_LIBRARY_PATH=/usr/local/cuda-10.2/targets/aarch64-linux/lib && python3 setup.py install’ returned a non-zero code: 1

I then tried to just outcomment the line “python3 setup.py install” for the torchvision installation, then start the container and run it manually.

This succeeds! it’s possible to install torchvision.

I would like to understand why the command ffails in the docker-file but succeeds when I run the docker-container.
My guess is that I have access to cuda devices while running the docker but not during the build of the docker.

How do I change my dockerfile so it can install torchvision?

Hi,

Please noted that l4t-pytorch:r32.5.0-pth1.7-py3 indicates that the L4T version is r32.5.
But in deepstream-l4t:5.1-21.02-sample, 5.1 is the Deepstream library version, not related to L4T.

We could build pytorch and torchvision from Dockerfile.
You can find an example below:

Would you mind to use it by updating the base to Deepstream container to see if it works?

Thanks.

1 Like

I’ve tried to use that dockerfile as a base but get the same error. I’ts an old version of pytorch and a lot of packages that does not exist to jetpack v4.5.1

you are also mentioning that the deepstream image is not related to l4t, but why does it has l4t in the image name?

Hi,

Since Deepstream can support both Jetson and desktop , the l4t tag is used for distinguishing the target environment.

To run the dockerfile on JetPack4.5.1, please update the corresponding package version based on below topic:

Thanks.

thanks, but I have followed the exact steps as you suggests and it works when I’m inside the docker but not when I’m writing them in a docker file. (I tried to explain this in my original post) When following these steps in the docker file I get the error as described.

Hi,

In general, we can get torchvision installed with the pyTorch base.
Let us try it with the deepstream base and share more information with you later.

Thanks.

Hi,

Sorry for the late update.
The OSError: libcurand.so.10 can be solved by adding docker default runtime.

1. Edit /etc/docker/daemon.json with the following patch and reboot:

diff --git a/daemon.json b/daemon.json
index ad77732..9afc625 100644
--- a/daemon.json
+++ b/daemon.json
@@ -4,5 +4,7 @@
             "path": "nvidia-container-runtime",
             "runtimeArgs": []
         }
-    }
+    },
+
+    "default-runtime": "nvidia"
 }

2. We can build torchvision within deepstream-l4t:5.1-21.02-samples as below:
Dockerfile (849 Bytes)

$ sudo docker build .

Thanks.