Yep that’s what I was doing before. Here’s the Dockerfile then:
FROM nvcr.io/nvidia/l4t-base:r32.3.1
COPY nvidia-l4t-apt-source.list /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
RUN apt-key adv --fetch-key https://repo.download.nvidia.com/jetson/jetson-ota-public.asc
# Update, upgrade and install basics
RUN apt-get update -y
RUN apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++ \
&& apt-get install -y libglib2.0-0 libsm6 libxext6 libxrender-dev nano wget python3-pip pkg-config ffmpeg
RUN python3 -m pip install --upgrade pip
ENV NVIDIA_VISIBLE_DEVICES=all
# Install PyTorch and TorchVision
# Taken from https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-8-0-now-available/72048
RUN wget https://nvidia.box.com/shared/static/mmu3xb3sp4o8qg9tji90kkxl1eijjfc6.whl -O torch-1.1.0-cp36-cp36m-linux_aarch64.whl \
&& apt-get -y install python3-pip libopenblas-base libopenmpi-dev \
&& python3 -m pip install Cython \
&& python3 -m pip install numpy torch-1.1.0-cp36-cp36m-linux_aarch64.whl
Running with --runtime nvidia and importing torch results in:
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 79, in <module>
from torch._C import *
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
It might be worth mentioning that I am trying to create such a container from a x86 workstation. I’m using qemu in order to simulate an aarch64 architecture. Might that be the problem ?
Hmm try building the container on the Jetson first. I haven’t experience cross-compiling the containers on x86 and understand it is tricky because the filesystem you are building it from is not JetPack.
Also, try starting l4t-base and make sure that you can see the CUDA libs under /usr/local/cuda/lib64 and they are linked properly:
sudo docker run -it --rm --net=host --runtime nvidia nvcr.io/nvidia/l4t-base:r32.3.1
I downloaded and installed the lastest Jetpack4.5.1 SDK manager which includes L4T 32.5.1,now,i want to know which version of pytorch i can download to install?
My device is NVIDIA JETSON TX2.
I try to install torch1.8.0, torch1.7.0 and torch1.6.0, but the error shows the version is not the supported version in my paltform.
Hi @ohalim2, when you get that root terminal, that is the command prompt running inside the container. You should use that root terminal to run your Python/PyTorch scripts, because that will run inside the container (which is where PyTorch is installed).
In your bottom picture, that is a new terminal that is running on your host device (not inside the container), so it won’t be able to find PyTorch because that runs outside of the container.
To install PyTorch natively, you would need to download and install the wheel with pip3. You can find the wheels and install instructions in this topic:
If you are on L4T R32.4.4, that is the correct way to start the l4t-pytorch container. After you run that second command, you’ll get the root prompt (#) which is running inside the container. You can then run Python scripts inside the container that will then be able to use PyTorch.
If you want to install PyTorch outside of container, you would need to install one of the wheels at the topic of this topic.
Hi, thanks. Could you explain briefly about this. “If you want to install PyTorch outside of a container, you would need to install one of the wheels at the topic of this topic.”
You asked above if you could install PyTorch on your host device, outside of container. Manually installing the PyTorch wheel from a normal terminal (outside of container) is how you would do that. The wheels and instructions are in the first post of this topic.
I did the steps above and install Pytorch 1.8.0 and torchvision 0.9.0. However, when I run to an error for the following line of code in my codes.
trainset = torchvision.datasets.CIFAR10(root=’/data’, train=True, download=True, transform=transforms.ToTensor())
You supplied it with the path /data, which since you have it starting with a backslash, means it is a top-level path like /usr or /root. Did you mean data/ instead if your data dir is under your project?
If the path is indeed /data, then it does not seem that your user has access to it and you need to adjust the permissions.