Docker container cannot find CUDA libraries (libcurand.so.10)

Hello all,

I am trying to setup the Jetson Nano using Docker and the existing containers. I have reviewed several pages on this forum but I was not able to fix the issues I am having. I am assuming the Docker container cannot reach the CUDA libraries.

Setup:

  • Jetson Nano Development Kit 4 GB
  • Jetpack 4.6.1 [L4T 32.7.1]
  • NVIDIA (R) Cuda compiler driver Cuda compilation tools, release 10.2, V10.2.300

First attempt
Based on the Dockerfile of GitHub - dusty-nv/jetson-containers: Machine Learning Containers for NVIDIA Jetson and JetPack-L4T, I try to build a container from l4t-base:r32.7.1 with torch and torchvision.

FROM nvcr.io/nvidia/l4t-base:r32.7.1

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3-pip \
		python3-dev \
        libopenblas-base \
		libopenblas-dev \
		libopenmpi-dev \
        openmpi-bin \
        openmpi-common \
		gfortran \
		libomp-dev \
        git \
        libjpeg-dev \
        zlib1g-dev \
        libpython3-dev \
        libavcodec-dev \
        libavformat-dev \ 
        libswscale-dev \
   	    build-essential \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir setuptools Cython wheel
RUN pip3 install --no-cache-dir -U jetson-stats
RUN pip3 install --no-cache-dir --verbose numpy

# PyTorch (for JetPack 4.6 DP)
ARG PYTORCH_URL=https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl
ARG PYTORCH_WHL=torch-1.10.0-cp36-cp36m-linux_aarch64.whl

RUN wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate ${PYTORCH_URL} -O ${PYTORCH_WHL} && \
    pip3 install --no-cache-dir --verbose ${PYTORCH_WHL} && \
    rm ${PYTORCH_WHL}

# torchvision 0.11.1
ARG TORCHVISION_VERSION=v0.10.0
ARG TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2;8.7;10.2"
RUN printenv && echo "torchvision version = $TORCHVISION_VERSION" && echo "TORCH_CUDA_ARCH_LIST = $TORCH_CUDA_ARCH_LIST"

RUN git clone https://github.com/pytorch/vision torchvision && \
    cd torchvision && \
    git checkout ${TORCHVISION_VERSION} && \
    python3 setup.py install && \
    cd ../ && \
    rm -rf torchvision

This fails when I try to install torchvision as it cannot find libcurand.so.10

Second attempt
I use the existing torch container provided by NVIDIA:

nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

If I import torch there it cannot find libcurand.so.10

Another note that I found is stat GPG public key is missing in this torch container and therefore no other packages cannot be installed.

Looking forward to your reply.

Hi,

PyTorch 1.10 should be compatible with v0.11.1 TorchVision rather than v0.10.0.
Could you update the below setting and try it again?

ARG TORCHVISION_VERSION=v0.11.1

Thanks.

Ah yes, same error though. I think it has to do something that Docker cannot find the CUDA libraries. Is there a specific way I should run my docker container?

Ok I made small progress. I have to some how add the nvidia runtime to it.

So I made this DockerFile:

FROM nvcr.io/nvidia/l4t-pytorch:r32.7.1-pth1.10-py3

ENV DEBIAN_FRONTEND=noninteractive

RUN pip3 install --upgrade pip
RUN pip3 install --no-cache-dir -U jetson-stats
sudo docker build -t container-test .
sudo docker run -it --runtime nvidia container-test

Now I can open python3 in the docker container and import torch.

So the issue is Docker related. How can I add the nvidia runtime during a build?

Hi @camiel1, if you set your default docker runtime to nvidia, then it will be used during build operations as well: https://github.com/dusty-nv/jetson-containers#docker-default-runtime

Thanks for the input and your repository has been really helpful so far! I get an error now when installing torchvision. I will post more details at a later stage.

For the record I was able to complete all the steps without trying to Dockerize it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.