PyTorch for Jetson

themozel · June 17, 2020, 11:48am

Thank you
I think I need to reflash the device

ruelj2 · June 18, 2020, 9:06pm

Thanks for this piece of code! Really appreciated. But my concern is about building it in a docker image. For now it seems almost impossible since MAGMA depends on cublas and it is not available in Nvidia’s l4t-base image. Also Nvidia don’t share cuda binaries for arm64 installation.
@dusty_nv do you see a solution?

Thanks a lot!

dusty_nv · June 19, 2020, 2:40pm

Hi @ruelj2, I see cublas in the l4t-base image:

$  sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-base:r32.4.2
# ls -ll /usr/lib/aarch64-linux-gnu/libcublas*
lrwxrwxrwx 1 root root       15 Jun 19 14:38 /usr/lib/aarch64-linux-gnu/libcublas.so -> libcublas.so.10
lrwxrwxrwx 1 root root       22 Jun 19 14:38 /usr/lib/aarch64-linux-gnu/libcublas.so.10 -> libcublas.so.10.2.2.89
-rw-r--r-- 1 root root 80530928 Oct 29  2019 /usr/lib/aarch64-linux-gnu/libcublas.so.10.2.2.89
lrwxrwxrwx 1 root root       17 Jun 19 14:38 /usr/lib/aarch64-linux-gnu/libcublasLt.so -> libcublasLt.so.10
lrwxrwxrwx 1 root root       24 Jun 19 14:38 /usr/lib/aarch64-linux-gnu/libcublasLt.so.10 -> libcublasLt.so.10.2.2.89
-rw-r--r-- 1 root root 33235064 Oct 29  2019 /usr/lib/aarch64-linux-gnu/libcublasLt.so.10.2.2.89

Are you running it with --runtime nvidia? To use it during docker build operations, you should set the default-runtime to nvidia: https://github.com/dusty-nv/jetson-containers#docker-default-runtime

ruelj2 · June 19, 2020, 3:29pm

Thank you very much for this precious information. Although the --runtime nvidia is not possible to specifiy during build stage, it is possible to “build my app within the container manually and commit the resulting image” using docker run -it --rm --runtime nvidia - ref.

dusty_nv · June 19, 2020, 3:32pm

To get the nvidia runtime during build stage, you can set the default-runtime to nvidia in your docker daemon configuration, as shown here: https://github.com/dusty-nv/jetson-containers#docker-default-runtime

hyunjaecho1213 · June 20, 2020, 12:33pm

@dusty_nv
I am having difficulty installing Pytorch with the following error:

Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/derek/.local/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
from torch._C import *
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

I realize that some people had the same error above so I installed it using the following commands but still the same error persists.

wget https://nvidia.box.com/shared/static/c3d7vm4gcs9m728j6o5vjay2jdedqb55.whl
sudo apt-get install python3-pip libopenblas-base libopenmpi-dev
pip3 install Cython
pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl

I used a SD card to flash image.

Ubuntu 18.04
CUDA Version 10.2.89
Not sure what Jetpack version but SD card image was downloaded today from https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit#write on a Mac host.

mura.ryo03302 · June 21, 2020, 5:54am

Can’t install pytorch on jetson xavier nx. The same problem occurs.
import error: from torch._C import *

qczone · June 22, 2020, 1:36pm

I notice that PyTorch has been updated to 1.5.1, if the PyTorch version of the Jetson Nano has been updated? If updated, can you give me a link？Thanks ^_^

qczone · June 22, 2020, 1:55pm

Installing PyTorch v1.5.0 can solve this problem, because the cuda version in the new image is 10.2.

dusty_nv · June 22, 2020, 3:06pm

In the past I haven’t build the *.1 minor releases because of time/support constraints, but I will pick up PyTorch 1.6 when it’s released. You can try building 1.5.1 from source though.

dusty_nv · June 22, 2020, 3:07pm

Hi @mura.ryo03302, which version of PyTorch did you install and which link did you use to download the wheel?

mura.ryo03302 · June 22, 2020, 3:43pm

I installed pytorch1.4 for JetPack 4.4 DP
Python 3.6 - torch-1.4.0-cp36-cp36m-linux_aarch64.whl

dusty_nv · June 22, 2020, 4:44pm

Hmm. Does it have any other error text, or does it only say import error: from torch._C import *

Also, you are trying to import this from a python3 environment, correct?

If you run the following command from a terminal, does it show any broken links or libraries it can’t find?

$ ldd ~/.local/lib/python3.6/site-packages/torch/_C.cpython-36m-aarch64-linux-gnu.so
        linux-vdso.so.1 (0x0000007f83729000)
        libtorch_python.so => /home/nvidia/.local/lib/python3.6/site-packages/torch/./lib/libtorch_python.so (0x0000007f82983000)
        libshm.so => /home/nvidia/.local/lib/python3.6/site-packages/torch/./lib/libshm.so (0x0000007f82969000)
        libnvToolsExt.so.1 => /usr/local/cuda-10.2/lib64/libnvToolsExt.so.1 (0x0000007f82950000)
        libtorch.so => /home/nvidia/.local/lib/python3.6/site-packages/torch/./lib/libtorch.so (0x0000007f58c5f000)
        libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f58c0d000)
        libc10_cuda.so => /home/nvidia/.local/lib/python3.6/site-packages/torch/./lib/libc10_cuda.so (0x0000007f58bd0000)
        libc10.so => /home/nvidia/.local/lib/python3.6/site-packages/torch/./lib/libc10.so (0x0000007f58b7b000)
        libcudart.so.10.2 => /usr/local/cuda-10.2/lib64/libcudart.so.10.2 (0x0000007f58b07000)
        libmpi_cxx.so.20 => /usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20 (0x0000007f58adc000)
        libmpi.so.20 => /usr/lib/aarch64-linux-gnu/libmpi.so.20 (0x0000007f589eb000)
        libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f58858000)
        libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f58834000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f586db000)
        /lib/ld-linux-aarch64.so.1 (0x0000007f836fe000)
        librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f586c4000)
        libcufft.so.10 => /usr/local/cuda-10.2/lib64/libcufft.so.10 (0x0000007f4c631000)
        libcurand.so.10 => /usr/local/cuda-10.2/lib64/libcurand.so.10 (0x0000007f48519000)
        libcublas.so.10 => /usr/lib/aarch64-linux-gnu/libcublas.so.10 (0x0000007f4383b000)
        libcudnn.so.8 => /usr/lib/aarch64-linux-gnu/libcudnn.so.8 (0x0000007f43805000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f4374b000)
        libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f4370e000)
        libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f436f9000)
        libnuma.so.1 => /usr/lib/aarch64-linux-gnu/libnuma.so.1 (0x0000007f436db000)
        libopenblas.so.0 => /usr/lib/aarch64-linux-gnu/libopenblas.so.0 (0x0000007f43013000)
        libcusparse.so.10 => /usr/local/cuda-10.2/lib64/libcusparse.so.10 (0x0000007f3a960000)
        libopen-rte.so.20 => /usr/lib/aarch64-linux-gnu/libopen-rte.so.20 (0x0000007f3a8ce000)
        libopen-pal.so.20 => /usr/lib/aarch64-linux-gnu/libopen-pal.so.20 (0x0000007f3a81c000)
        libhwloc.so.5 => /usr/lib/aarch64-linux-gnu/libhwloc.so.5 (0x0000007f3a7d8000)
        libcublasLt.so.10 => /usr/lib/aarch64-linux-gnu/libcublasLt.so.10 (0x0000007f38812000)
        libgfortran.so.4 => /usr/lib/aarch64-linux-gnu/libgfortran.so.4 (0x0000007f3870e000)
        libutil.so.1 => /lib/aarch64-linux-gnu/libutil.so.1 (0x0000007f386fb000)
        libltdl.so.7 => /usr/lib/aarch64-linux-gnu/libltdl.so.7 (0x0000007f386e2000)

hyunjaecho1213 · June 23, 2020, 3:33am

Thanks! It worked

mura.ryo03302 · June 24, 2020, 7:42am

The errors and their execution environments are as follows

xavier@xavier-desktop:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
xavier@xavier-desktop:~$  python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xavier/.local/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory

command and found that some links are broken. Please let me know how to solve this problem.

avier@xavier-desktop:~$ ldd ~/.local/lib/python3.6/site-packages/torch/_C.cpython-36m-aarch64-linux-gnu.so
	linux-vdso.so.1 (0x0000007f943b2000)
	libgtk3-nocsd.so.0 => /usr/lib/aarch64-linux-gnu/libgtk3-nocsd.so.0 (0x0000007f94335000)
	libtorch_python.so => /home/xavier/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so (0x0000007f935cc000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f935b7000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f9358b000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f93432000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f94387000)
	libshm.so => /home/xavier/.local/lib/python3.6/site-packages/torch/lib/libshm.so (0x0000007f93418000)
	libnvToolsExt.so.1 => /usr/local/cuda/lib64/libnvToolsExt.so.1 (0x0000007f933ff000)
	libtorch.so => /home/xavier/.local/lib/python3.6/site-packages/torch/lib/libtorch.so (0x0000007f6970e000)
	libc10_cuda.so => /home/xavier/.local/lib/python3.6/site-packages/torch/lib/libc10_cuda.so (0x0000007f696d1000)
	libc10.so => /home/xavier/.local/lib/python3.6/site-packages/torch/lib/libc10.so (0x0000007f6967c000)
	libcudart.so.10.2 => /usr/local/cuda/lib64/libcudart.so.10.2 (0x0000007f69608000)
	libmpi_cxx.so.20 => not found
	libmpi.so.20 => not found
	libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f69474000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f69450000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f69439000)
	libcufft.so.10 => /usr/local/cuda/lib64/libcufft.so.10 (0x0000007f5d3a6000)
	libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x0000007f5928e000)
	libcublas.so.10 => /usr/lib/aarch64-linux-gnu/libcublas.so.10 (0x0000007f545b0000)
	libcudnn.so.8 => /usr/lib/aarch64-linux-gnu/libcudnn.so.8 (0x0000007f5457a000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f544c0000)
	libgomp.so.1 => /usr/lib/aarch64-linux-gnu/libgomp.so.1 (0x0000007f54483000)
	libnuma.so.1 => /usr/lib/aarch64-linux-gnu/libnuma.so.1 (0x0000007f54465000)
	libmpi_cxx.so.20 => not found
	libmpi.so.20 => not found
	libopenblas.so.0 => /usr/lib/aarch64-linux-gnu/libopenblas.so.0 (0x0000007f53d9d000)
	libcusparse.so.10 => /usr/local/cuda/lib64/libcusparse.so.10 (0x0000007f4b6ea000)
	libcublasLt.so.10 => /usr/lib/aarch64-linux-gnu/libcublasLt.so.10 (0x0000007f49724000)
	libgfortran.so.4 => /usr/lib/aarch64-linux-gnu/libgfortran.so.4 (0x0000007f49621000)

Andrey1984 · June 24, 2020, 10:36am

you may try searching the file;

find ~ -name libmpi*

otherwise

sudo apt install mlocate
sudo updatedb
locate libmpi

If the library is not presented in the system - then it will need to be installed; otherwise paths will need to be adjusted. e.g. somewhat like

export LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/openmpi/lib:$LIBRARY_PATH

reference ImportError: libmpi_cxx.so.1:, the setup of LD_LIBRARY_PATH after install openmpi-bin (Ubuntu18.04) - Solved · Issue #3499 · microsoft/CNTK · GitHub
reference mpi.h: No such file or directory · Issue #32 · NVIDIA/nccl-tests · GitHub

dusty_nv · June 24, 2020, 4:24pm

Thanks Andrei, yes you should be able to install it with sudo apt-get install libopenmpi-dev

themozel · June 25, 2020, 6:51am

Hey everyone!
Is there a wheel of torch for python 3.7 for using on a Jetson Xavier AGX ?

dusty_nv · June 25, 2020, 3:57pm

Hi @themozel, I build the wheels for Python 3.6. However some others on this thread have been able to build PyTorch from source for Python 3.7. I think you need to run sudo apt-get install python3.7-dev first.

mura.ryo03302 · June 25, 2020, 4:53pm

Thanks Andrey1984, dusty_nv.
All resolved.
The problem was that there was no library, so I solved it by installing libopenmpi-dev.