Installing torch and torchvision in l4t-jetpack based docker image on Jetson Xavier NX

Hi

Using the l4t-jetpack:r35.1.0 as base image I’m trying to install additional dependencies in this docker image. The ultimate goal of this exercise is to be able to run Model Converter and Inference SDK of (mmdeploy) inside a docker container on a Jetson Xavier NX.

However, I get stuck at trying to install one of the needed dependencies, torchvision:

In the docker image, I include:

ENV FORCE_CUDA="1"

# torch
RUN wget https://developer.download.nvidia.com/compute/redist/jp/v50/pytorch/torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl -O torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl &&\
    pip install torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl

RUN apt-get update &&\
    apt-get install -y libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev libopenblas-base libopenmpi-dev  libopenblas-dev --no-install-recommends &&\
    rm -rf /var/lib/apt/lists/*

ENV CUDA_HOME = "/usr/local/cuda/"


# torchvision
RUN git clone --branch v0.13.0 https://github.com/pytorch/vision torchvision &&\
    cd torchvision &&\
    export BUILD_VERSION=0.13.0 &&\
    pip install -e .

When trying to build I get the following error message:

  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting requests
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
Collecting pillow!=8.3.*,>=5.3.0
  Downloading Pillow-9.3.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.0 MB)
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from torchvision==0.13.0) (1.12.0a0+2c916ef.nv22.3)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torchvision==0.13.0) (4.4.0)
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from torchvision==0.13.0) (1.17.4)
Collecting certifi>=2017.4.17
  Downloading certifi-2022.9.24-py3-none-any.whl (161 kB)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.13-py2.py3-none-any.whl (140 kB)
Collecting idna<4,>=2.5
  Downloading idna-3.4-py3-none-any.whl (61 kB)
Collecting charset-normalizer<3,>=2
  Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Installing collected packages: certifi, urllib3, idna, charset-normalizer, requests, pillow, torchvision
  Running setup.py develop for torchvision
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/root/workspace/mmdeploy/torchvision/setup.py'"'"'; __file__='"'"'/root/workspace/mmdeploy/torchvision/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
         cwd: /root/workspace/mmdeploy/torchvision/
    Complete output (33 lines):
    No CUDA runtime is found, using CUDA_HOME='= /usr/local/cuda/'
    Building wheel torchvision-0.13.0
    PNG found: False
    Running build on conda-build: False
    Running build on conda: False
    JPEG found: True
    Building torchvision with JPEG image support
    NVJPEG found: False
    FFmpeg found: False
    video codec found: False
    The installed version of ffmpeg is missing the header file 'bsf.h' which is required for GPU video decoding. Please install the latest ffmpeg from conda-forge channel: `conda install -c conda-forge ffmpeg`.
    running develop
    running egg_info
    writing torchvision.egg-info/PKG-INFO
    writing dependency_links to torchvision.egg-info/dependency_links.txt
    writing requirements to torchvision.egg-info/requires.txt
    writing top-level names to torchvision.egg-info/top_level.txt
    reading manifest file 'torchvision.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    /tmp/pip-build-env-2kdphqcs/overlay/lib/python3.8/site-packages/setuptools/config/setupcfg.py:508: SetuptoolsDeprecationWarning: The license_file parameter is deprecated, use license_files instead.
      warnings.warn(msg, warning_class)
    /tmp/pip-build-env-2kdphqcs/overlay/lib/python3.8/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /tmp/pip-build-env-2kdphqcs/overlay/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
      warnings.warn(
    /usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
    adding license file 'LICENSE'
    writing manifest file 'torchvision.egg-info/SOURCES.txt'
    running build_ext
    error: [Errno 2] No such file or directory: '= /usr/local/cuda/bin/nvcc'
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/root/workspace/mmdeploy/torchvision/setup.py'"'"'; __file__='"'"'/root/workspace/mmdeploy/torchvision/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.

I think the error lies in No CUDA runtime is found, using CUDA_HOME='= /usr/local/cuda/', but Im not sure how to solve this.

Hi @christian73, are you sure that you are building your container with l4t-jetpack:r35.1.0 as the base image, and if you run l4t-jetpack container that nvcc is found under /usr/local/cuda/bin ? I don’t seem to have the same issue.

Also, we have prebuilt PyTorch + torchvision containers here:

The dockerfiles and build scripts for them are found here: https://github.com/dusty-nv/jetson-containers

Hi @dusty_nv
I use the jetpack image from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-jetpack

I cleaned up the dockerfile, so I could include it here:

FROM nvcr.io/nvidia/l4t-jetpack:r35.1.0

RUN apt-get update &&\
    apt-get install -y vim git libspdlog-dev dpkg python3-pip libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev libopenblas-base libopenmpi-dev libopenblas-dev ffmpeg --no-install-recommends &&\
    rm -rf /var/lib/apt/lists/*

WORKDIR /root/workspace

RUN pip install ninja

# Install pytorch
RUN wget https://developer.download.nvidia.com/compute/redist/jp/v50/pytorch/torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl -O torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl &&\
    pip install torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl
    
ENV CUDA_HOME = "/usr/local/cuda/"

ENV FORCE_CUDA="1"

# Install torchvision
RUN git clone --branch v0.13.0 https://github.com/pytorch/vision torchvision &&\
    cd torchvision &&\
    export BUILD_VERSION=0.13.0 &&\
    pip install -e .

I am aware of the pytorch image, but I use the jetpack container, since I also need TensorRT, CUDA and cuDNN in my application, but I suppose I could also do it the other way around and install these dependencies in the pytorch image instead.

I have looked around a bit in here, and I can’t see how torchvision is built? It seems like docker_build_ml.sh script only builds pytorch and Dockerfile.pytorch seems to build version v0.4.0 of torchvision?

l4t-pytorch uses l4t-jetpack as it’s base container on JetPack 5, so l4t-pytorch already includes TensorRT, CUDA, and cuDNN.

The version of torchvision that gets built gets set dynamically by the docker_build_ml.sh script. For example, this section builds the container with PyTorch 1.13 and torchvision 0.13. It sets the arguments in Dockerfile.pytorch