I’m trying to install PyTorch 1.2.0 and Torchvision 0.4.0 from the l4t-base:r32.3.1 image on my Jetson Nano. Here are the relevant parts of my Dockerfile:
FROM nvcr.io/nvidia/l4t-base:r32.3.1
...
$PIP_INSTALL \
numpy \
pandas \
cloudpickle \
Cython \
boto3 \
&& \
$APT_INSTALL \
zlib1g \
zlib1g-dev \
libjpeg-dev \
&& \
...
wget https://nvidia.box.com/shared/static/06vlvedmqpqstu1dym49fo7aapgfyyu9.whl -O torch-1.2.0a0+8554416-cp36-cp36m-linux_aarch64.whl && \
pip3 install torch-1.2.0a0+8554416-cp36-cp36m-linux_aarch64.whl \
&& \
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/targets/aarch64-linux/lib/ && \
ls /usr/local/cuda-10.0/targets/aarch64-linux/lib/ && \ # Just to see files in folder
git clone --branch v0.4.0 https://github.com/pytorch/vision torchvision && \
cd torchvision && \
sudo python3 setup.py install && \
cd ../ && \
$PIP_INSTALL \
'pillow<7'
I found the above from instructions from this post..
The build fails when attempting to import torch when running the Torchvision setup.py file:
Successfully installed torch-1.2.0a0+8554416
# 4 lines below are from the ls command in the above Dockerfile
libcudadevrt.a
libcudart_static.a
stubs
Cloning into 'torchvision'...
Traceback (most recent call last):
File "setup.py", line 13, in <module>
import torch
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 81, in <module>
from torch._C import *
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
Doing some searching, I did find this user having similar issues on these forums but the user did not provide more information on how it was solved after solving it and the user was missing a different file. As you can see in my Dockerfile, I followed some of the instructions from the Nvidia employee in that thread (tried to export the path, also searched the contents of the folder which also confirmed the file was missing).
Any ideas why this file might be missing? FWIW I was able to get this to install fine directly onto my Jetson Nano without Docker.
Many thanks.
EDIT:
Upon further searching and reading, it looks like this might be a CUDA version issue? Am I supposed to bring CUDA in from my host system to the Docker container before I can install Torchvision?