PyTorch for Jetson

robin.blanchard00 · March 12, 2021, 4:56pm

It might be worth mentioning that I am trying to create such a container from a x86 workstation. I’m using qemu in order to simulate an aarch64 architecture. Might that be the problem ?

dusty_nv · March 12, 2021, 4:58pm

Hmm try building the container on the Jetson first. I haven’t experience cross-compiling the containers on x86 and understand it is tricky because the filesystem you are building it from is not JetPack.

Also, try starting l4t-base and make sure that you can see the CUDA libs under /usr/local/cuda/lib64 and they are linked properly:

sudo docker run -it --rm --net=host --runtime nvidia  nvcr.io/nvidia/l4t-base:r32.3.1

robin.blanchard00 · March 12, 2021, 5:02pm

Starting the base container on x86 leads to /usr/local/cuda/lib64 only containing libcudadevrt.a libcudart_static.a stubs. I guess that is the issue

robin.blanchard00 · March 12, 2021, 5:20pm

I have tried building the container directly on the Jetson. CUDA libs were correctly linked, but trying to import torch result in:

>>> import torch
Illegal instruction (core dumped)

dusty_nv · March 12, 2021, 6:27pm

Does the same thing happen if you just import numpy?

If so, try setting ENV OPENBLAS_CORETYPE=ARMV8 in your Dockerfile
(Illegal instruction (core dumped) on import for numpy 1.19.5 on ARM64 · Issue #18131 · numpy/numpy · GitHub)

gxapple123 · March 15, 2021, 7:02am

I downloaded and installed the lastest Jetpack4.5.1 SDK manager which includes L4T 32.5.1，now，i want to know which version of pytorch i can download to install?
My device is NVIDIA JETSON TX2.
I try to install torch1.8.0, torch1.7.0 and torch1.6.0, but the error shows the version is not the supported version in my paltform.

robin.blanchard00 · March 15, 2021, 9:31am

Hello @dusty_nv, that was indeed the case. I finally got a fully working container, thanks a lot.
For anyone interested, here’s the Dockerfile:

FROM nvcr.io/nvidia/l4t-base:r32.3.1

COPY nvidia-l4t-apt-source.list /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
RUN apt-key adv --fetch-key https://repo.download.nvidia.com/jetson/jetson-ota-public.asc

# Update, upgrade and install basics
RUN apt-get update -y
RUN apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++ \
 && apt-get install -y libglib2.0-0 libsm6 libxext6 libxrender-dev nano wget python3-pip pkg-config ffmpeg
RUN python3 -m pip install --upgrade pip

ENV NVIDIA_VISIBLE_DEVICES=all
ENV OPENBLAS_CORETYPE=ARMV8

# Install PyTorch and TorchVision
# Taken from https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-8-0-now-available/72048
RUN wget https://nvidia.box.com/shared/static/mmu3xb3sp4o8qg9tji90kkxl1eijjfc6.whl -O torch-1.1.0-cp36-cp36m-linux_aarch64.whl \
 && apt-get -y install python3-pip libopenblas-base libopenmpi-dev \
 && python3 -m pip install Cython \
 && python3 -m pip install numpy torch-1.1.0-cp36-cp36m-linux_aarch64.whl

RUN apt-get install -y libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev \
  && git clone --branch v0.3.0 https://github.com/pytorch/vision torchvision \
  && cd torchvision \
  && export BUILD_VERSION=0.3.0 \
  && python3 setup.py install --user

dusty_nv · March 15, 2021, 3:59pm

Hi @gxapple123, you should be able to use any of those wheels on L4T 32.5.1. Are you trying to install them with pip3?

You might also want to try the l4t-pytorch container, you can use the r32.5.0 tag on 32.5.1.

a1564123895 · March 19, 2021, 2:58am

I have solved the problem. I flash JetPack 4.5 and install pytorch 1.6.0 as followed the Official website tutorial.Thank you.

ohalim2 · March 22, 2021, 6:38pm

Hi,
I did the following based on this.
I end up with the following “root”. What should I do next!?
There is no torch as well.

dusty_nv · March 22, 2021, 6:47pm

Hi @ohalim2, when you get that root terminal, that is the command prompt running inside the container. You should use that root terminal to run your Python/PyTorch scripts, because that will run inside the container (which is where PyTorch is installed).

In your bottom picture, that is a new terminal that is running on your host device (not inside the container), so it won’t be able to find PyTorch because that runs outside of the container.

ohalim2 · March 22, 2021, 7:27pm

Is there any way to install it on the host device! what should I change in this?

dusty_nv · March 22, 2021, 7:36pm

To install PyTorch natively, you would need to download and install the wheel with pip3. You can find the wheels and install instructions in this topic:

ohalim2 · March 22, 2021, 7:43pm

All are on my Jetson TX2. I did insert the following commands on my Jetson TX2’s terminal and my first post’s screen shots are based on this.

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.7-py3
sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.7-py3

dusty_nv · March 23, 2021, 12:17am

If you are on L4T R32.4.4, that is the correct way to start the l4t-pytorch container. After you run that second command, you’ll get the root prompt (#) which is running inside the container. You can then run Python scripts inside the container that will then be able to use PyTorch.

If you want to install PyTorch outside of container, you would need to install one of the wheels at the topic of this topic.

ohalim2 · March 23, 2021, 12:21am

Hi, thanks. Could you explain briefly about this. “If you want to install PyTorch outside of a container, you would need to install one of the wheels at the topic of this topic.”

dusty_nv · March 23, 2021, 12:33am

You asked above if you could install PyTorch on your host device, outside of container. Manually installing the PyTorch wheel from a normal terminal (outside of container) is how you would do that. The wheels and instructions are in the first post of this topic.

ohalim2 · March 23, 2021, 5:47am

I did the steps above and install Pytorch 1.8.0 and torchvision 0.9.0. However, when I run to an error for the following line of code in my codes.
trainset = torchvision.datasets.CIFAR10(root=‘/data’, train=True, download=True, transform=transforms.ToTensor())

The error is:

dusty_nv · March 23, 2021, 2:05pm

You supplied it with the path /data, which since you have it starting with a backslash, means it is a top-level path like /usr or /root. Did you mean data/ instead if your data dir is under your project?

If the path is indeed /data, then it does not seem that your user has access to it and you need to adjust the permissions.

6200575 · March 24, 2021, 1:31pm

Pytorch1.6.0 was successfully installed for, but an error occurred when I installed the torchvision 0.7.0
subprocess.CalledProcessError: Command ‘[‘which’, ‘aarch64-conda_cos7-linux-gnu-c++’]’ returned non-zero exit status 1