PyTorch for Jetson

dusty_nv · July 21, 2021, 2:04pm

Hi @naisy, I previously updated the PyTorch 1.9 patch to include the NEON patch (after finding it was causing runtime computation errors)

https://gist.github.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1#file-pytorch-1-9-jetpack-4-5-1-patch

dusty_nv · July 21, 2021, 2:06pm

That is the path from which I built it on my Jetson. This is the corresponding location in the PyTorch 1.9 source code:

https://github.com/pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/aten/src/ATen/Context.cpp#L181

gyt.971027 · July 22, 2021, 10:04am

This is the Context.cpp file, and I have build source and set variables.

And why I still have the problems?

 terminate called after throwing an instance of 'c10::Error'
  what():  quantized engine QNNPACK is not supported
Exception raised from setQEngine at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/aten/src/ATen/Context.cpp:181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xa0 (0x7f445c7300 in /home/yuantian/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb4 (0x7f445c36b4 in /home/yuantian/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: at::Context::setQEngine(c10::QEngine) + 0x138 (0x7f5aeb0940 in /home/yuantian/.local/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #3: THPModule_setQEngine(_object*, _object*) + 0x94 (0x7f5fa12364 in /home/yuantian/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #5: python3() [0x52ba70]
frame #7: python3() [0x529978]
frame #9: python3() [0x5f4d34]
frame #11: python3() [0x5a7228]
frame #12: python3() [0x582308]
frame #16: python3() [0x529978]
frame #17: python3() [0x52b8f4]
frame #19: python3() [0x52b108]
frame #24: __libc_start_main + 0xe0 (0x7f78952720 in /lib/aarch64-linux-gnu/libc.so.6)
frame #25: python3() [0x420e94]

Aborted (core dumped)

@Andrey1984 @dusty_nv
Kind regards

ziimiin · July 23, 2021, 2:17am

Hi @naisy,

Thanks for the reply. I tried out your suggestion and used the latest updated patch from @dusty_nv (https://github.com/pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/aten/src/ATen/Context.cpp#L181) to build PyTorch. It did successfully resolved my ISSUE 1. Now, it takes about 10 sec to parse the tensor from CPU to GPU.

However, the ISSUE 2 which is the error of the torch.solve function still appears. Should I just ignore the error or does the error means anything in this case?

Thank you and I do appreciate your help very much!

Regards,
@ziimiin

dusty_nv · July 23, 2021, 4:41pm

Hi @ziimiin, I haven’t built MAGMA before, but are you able to test run MAGMA independently of PyTorch?

Maybe there is some setting you need to compile MAGMA with to enable the correct GPU architectures for Jetson? (sm53, sm62, sm72)

EDIT: It appears they need to be added in the MAGMA makefile:
https://github.com/CEED/MAGMA/blob/79b982c88d64c660a04353fbac77fe00580060aa/Makefile#L93

naisy · July 27, 2021, 10:20am

Hi @ziimiin
I built MAGMA and Pytorch with Jetson and tried your ISSUE2.
When using MAGMA in Python, it seems that you need to call magma_init () first.

If I do not call magma_init(), I will get an error message.

Error in magma_getdevice_arch: MAGMA not initialized (call magma_init() first) or bad device

Calling magma_init() will resolve this error.

naisy · July 27, 2021, 12:02pm

I have uploaded the docker I used. (17.6GB)
I confirmed that it can also work with Jetson Nano.

sudo docker run --runtime=nvidia --rm -it -u jetson naisy/jetson451-pytorch-magma
python3

import torch
import ctypes

magma_path = '/usr/local/magma/lib/libmagma.so'
libmagma = ctypes.cdll.LoadLibrary(magma_path)
libmagma.magma_init()
A = torch.randn(2,3,3).cuda()
B = torch.randn(2,3,4).cuda()
torch.linalg.solve(A, B)
libmagma.magma_finalize()

The built source code is under /opt/github/.

ziimiin · July 27, 2021, 3:18pm

Hi @naisy,

Appreciated your help! Thanks so much

Same to @dusty_nv . Thanks for the response and help as well.

Regards,
@ziimiin

steph27 · July 28, 2021, 2:37pm

Hi,

I try to use libtorch (v1.9.0 precompiled) with QT (5.9.5) on a Jetson nano.
I included the library :

 LIBS += -L"/home/jetson/libs/libtorch/lib" -ltorch

but I got this error :

skipping incompatible /home/jetson/libs/libtorch/lib/libtorch.so when searching for -ltorch
cannot find -ltorch

I found on internet that this error would mean that the library doesn’t correspond to the system (lib x86 and system x64), but when I check the lib :

file libtorch.so

I got

libtorch.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=710c62b2bda3d3b7bb737e5026de30fd9aca88e0, not stripped

So I’m a bit lost…
Anyone has an idea ?

Regards
Stéphane

dusty_nv · July 28, 2021, 3:00pm

Hi @steph27, when I run the same thing and inspect libtorch.so, I see that it was built for aarch64 (not x86_64):

~/.local/lib/python3.6/site-packages/torch/lib$ file libtorch.so
libtorch.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, BuildID[sha1]=a00d5bc6e166568806d09407e6cf14873c5687e4, not stripped

Are you sure that you installed the PyTorch 1.9 wheel for Jetson from this link? https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl

steph27 · July 29, 2021, 5:34am

Thanks for the path !!

As I couldn’t find the library and the include (for c++), I downloaded the pre-builded version (https://download.pytorch.org/libtorch/cu102/libtorch-win-shared-with-deps-1.9.0%2Bcu102.zip) which is not compatible with the Jetson Nano.

Thank you for your help.

Regards
@steph27

dusty_nv · July 29, 2021, 1:17pm

No problem - by the way, you can find these paths by running pip3 show <package-name>, and it will print out where the package is installed to.

619914127 · August 3, 2021, 3:08am

After I run the code ‘sudo pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl’
it shows that torch 1.7.0 is successfully installed
But when I ‘import toch’
It still shows ‘No module named torch’
I dont know why
Thanks！

dusty_nv · August 3, 2021, 3:57am

Hi @619914127, are you running python3 to import torch? Can you run python3 -c 'import torch'?

619914127 · August 3, 2021, 4:18am

It still dont work
I cant even find it in my pip list

dusty_nv · August 3, 2021, 1:51pm

I believe it’s because you are running python and pip which are the Python 2.7 versions - whereas you installed the wheel for Python 3.6 with pip3. Try using python3 and pip3 instead.

vashisht.akshat.rn.04 · August 21, 2021, 8:40am

I am having Jetson Xavier nx with jetpack 4.4
I am unable to install pytorch 1.5.1 through the commands given above.
PyTorch v1.5.0

JetPack 4.4 Developer Preview (L4T R32.4.2)
- Python 3.6 - torch-1.5.0-cp36-cp36m-linux_aarch64.whl
- As per the PyTorch Release Notes , Python 2 is not longer supported

Please tell how to install that as soon as possible.

Firstly it get install successfully but when I downloaded the other libraries given below:-
numpy
pillow
scikit-learn
tqdm
albumentations
jupyterlab
matplotlib
natsort
scikit-image>=0.16.1
tensorboardx
tensorboard
torchcontrib
tifffile
pygit2
Archiconda

then it automatically get removed and Now it is not downloading again throwing the error.

dusty_nv · August 23, 2021, 3:03pm

Hi @vashisht.akshat.rn.04, I believe your issue is that you are using pip to try an install the wheel, when you should be using pip3 (pip is for Python 2.7 and pip3 is for Python 3.6 - and these wheels are for Python 3.6)

Also, please make sure the wheel you are installing is compatible with the version of JetPack you have. That wheel you linked to is only for JP 4.4 Developer Preview (L4T R32.4.2). If you are on the JP 4.4 production release (L4T R32.4.3) you would want to use a more recent wheel.

monchatparle · August 25, 2021, 3:52pm

I could not import torchaudio.

As the instructions in l4t-pytorch ,

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3

I started l4t-pytorch container.
But when I ran the code

import torch
import torchaudio

I got a warning message

/usr/local/lib/python3.6/dist-packages/torchaudio-0.9.0a0+33b2469-py3.6-linux-aarch64.egg/torchaudio/backend/utils.py:67: UserWarning: No audio backend is available.
  warnings.warn('No audio backend is available.')

And then, I installed the sox

pip3 install sox

But I still got the same waring message.

Is there anyone knows how to solve this problem, any advice would be appreciated.
Thanks.

dusty_nv · August 25, 2021, 5:49pm

Hmm I tried rebuilding the container after adding pip3 install sox (I already had apt-get install sox in the dockerfile), but still got this issue. Will require further investigation…if you figure it out, let me know.