PyTorch for Jetson

Why don’t these containers include OpenCV yet? Seriously what’s the point of this. Your docker system is so useless I’ve spent a week on headaches of missling libraries that should have been included by default. All these dumb methods to pipe over libraries from my main os… Does anyone who actually do ML work make these?

Hi @industrialacc0, we try to keep the size down on the l4t-pytorch and l4t-tensorflow containers by not installing extra libraries into those. You can use these base containers as a starting point and create your own containers from them (e.g. via Dockerfiles)

You can see the Dockerfile commands for installing the version of OpenCV that comes with JetPack here, so you needn’t mount it from the host if you don’t want to:

https://github.com/NVIDIA-AI-IOT/jetbot/blob/cbf6f1b5a3285ad3bbb18ec552ed79846d1e2529/docker/base/Dockerfile#L47

You can also change/rebuild these base containers yourself, as the base Dockerfiles and build scripts for the containers are open-sourced here:

https://github.com/dusty-nv/jetson-containers

And yes, I do a lot of ML work in PyTorch but have not needed OpenCV as a frequent dependency, and it is a large library. I’ll consider adding it to the larger l4t-ml container, but perhaps folks may desire their own customized/newer version of OpenCV, which having a pre-existing version could complicate the install of.

Hi @zlbzailushang, these PyTorch wheels were built for Python 3.6, so they wouldn’t work on Python 3.8. However some other users on this topic and in the forums have been able to rebuild PyTorch for Python 3.8. Please see this post for more info:

thanks a lot.

need VPN to use wget to download

Thanks I was very frustrated when I wrote this I didn’t expect a useful solution so I just built OpenCV from scratch before. Your solution will probably work a lot better. In another topic your colleague recommended to change csv files and stuff

On a sidenote, I would expect most people to use the Jetsons for Machine vision inference devices (at least I do). Having pure ML containers without the libraries most commonly used in combination with them such as seems like a bit of a waste but that’s up to you guys I guess.

In any case thank you!

@industrialacc0 thanks for your feedback and following up. In the next version of the l4t-ml container, I have added OpenCV 4.4.1 to it (the one that comes with JetPack). l4t-ml is the big container with PyTorch/TensorFlow/JupyterLab/scipy/sklearn/ect, so that can provide users a good starting point.

@dusty_nv , something seems wrong with the prebuilt PyTorch v1.7.0. To reproduce on AGX Xavier:

Let’s start with the official Docker image:
docker run --rm -it --gpus all nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.6-py3

Now, inside the Docker container, we do:
wget https://nvidia.box.com/shared/static/wa34qwrwtk9njtyarwt5nvo6imenfy26.whl -O torch-1.7.0-cp36-cp36m-linux_aarch64.whl

pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl

python3

import torch
import torch.nn.functional as F
x = torch.tensor([12.345])
print(x)
print(F.softmax(x))

The expected results are:
12.345
1.

The actual results are:
12.
nan

The above issue only happens to CPU tensors. If we do x = x.cuda() before printing, then we’ll see the correct results. I suspect something is wrong with the CPU library.

Also, PyTorch v1.6.0 doesn’t have the above problem.

Hi @yin, it seems this issue may be related to this PyTorch 1.7.0 issue: https://github.com/pytorch/pytorch/issues/49157

Not sure if it has been addressed in 1.7.1 or not. I don’t think it is particular to how I built the wheel. For now, you may want to stick with 1.6.0 or try building the wheel for 1.7.1 to see if that fixes it (although I would expect that to be indicated in the PyTorch issue above at some point)

1 Like

Thanks for the prompt reply @dusty_nv.

I built v1.7.1 from source following your instructions, and encountered the same issue.

This issue is specific to Jetson. It doesn’t happen on the desktop version of PyTorch downloaded from PyTorch’s conda channel.

David

Hmm I’m not sure what the issue is, sorry - from what I can tell, it doesn’t appear to happen on other datatypes or CUDA tensors. You may want to check with the PyTorch folks for a more in-depth look.

The same problem here. I have tried PyTorch 1.7.0 on a L4T 32.4.3/Xavier NX box.

$ python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.empty(5, 3)
>>> print(x)
tensor([[ 3.6228e+12,  1.7796e-43,  5.0356e-07],
        [ 0.0000e+00, -1.0300e-19,  1.7796e-43],
        [ 4.6333e-07,  0.0000e+00, -8.1911e-20],
        [ 1.7796e-43,  7.1668e+11,  1.7796e-43],
        [ 3.6220e+12,  1.7796e-43,  7.1775e+11]])
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 1., 1.],
        [1., 0., 0.]])
>>> x = torch.tensor([5.5, 3])
>>> print(x)
tensor([6., 3.])
>>> b = torch.randn(2).cuda()
>>> print(b)
tensor([-0.3275,  1.3559], device='cuda:0')

This problem can be solved by disabling NEON, as is done in the current master branch (line 29):

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cpu/vec256/vec256_float_neon.h

Basically, this issue comes from the fact that the current NEON code doesn’t compile correctly in GCC7. So, the master branch only adds NEON support for GCC > 8.3.

I did that, recompiled v1.7.1 and the problem went away. I suggest you do the same for the official build.

Thank you.

David

1 Like

Ah thanks for tracking that down, David. I will cherry-pick PR #47099 from PyTorch master and rebuild/repost the wheel.

I can confirm this issue is reproducible on both Jetson Nano 2GB and 4GB using official sd card image (jp441) and this prebuilt 1.7.0 pytorch. It is very hard for me to accept this level of bug from a huge enterprise like NVIDIA…

Code

import torch
torch.set_printoptions(precision=4)

x = [0.1, 0.2, 0.3]
x_t_cpu = torch.Tensor(x)
x_t_cuda = torch.Tensor(x).cuda()

print('x', x)
print('x_t_cpu', x_t_cpu)
print('x_t_cuda', x_t_cuda)

Output

x [0.1, 0.2, 0.3]
x_t_cpu tensor([0., 0., 0.])
x_t_cuda tensor([0.1000, 0.2000, 0.3000], device='cuda:0')

@mfkenson the issue has already been confirmed above and traced to PyTorch bug #47098, which is a regression in PyTorch 1.7 and newer. PyTorch is not an NVIDIA product and I personally build these wheels for the convenience of the community. Sorry for the inconvenience - I will be re-posting the patched wheel shortly.

OK, the updated PyTorch 1.7 wheel that fixes the bug above has been uploaded to here:

The patch used for this is here: PyTorch patch for building on JetPack >= 4.4 · GitHub

It appears the fix is already made in PyTorch master, so future releases after PyTorch 1.7.1 should not need this manually patched.

Thank you for the build! I thought the build was officially from nvidia. I did not mean blaming you @dusty_nv. I am sorry. In fact I do appreciate your contribution. Wishing you a very merry christmas!

No worries, it appears that testing of the ARM CPU vectorized tensor operations fell through the cracks of both the PyTorch/ATen maintainers and myself. For future releases I will be sure to test CPU ops as well. My testing to date has consisted of running a bunch of models through torchvision and making sure their inferencing accuracy is close to the published accuracy (script here) - that was using CUDA though.

Wish you and your family a wonderful holiday as well!

1 Like

Has anyone here managed to build Pytorch with MAGMA? I did that and encountered a strange performance issue. Consider the following code:

import torch

A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
B = torch.tensor([[1.0, 2.0], [3.0, 4.0]])

A = A.cuda()
B = B.cuda()

torch.solve(B, A)

When I first installed Pytorch + MAGMA, the above code took 13 minutes to finish. Throughout the time the GPU was idle and one CPU core was running at 100%. After running it once, the next time I run it, the time drops to a normal level (a few seconds).

The above issue can be reproduced in a docker container. Whenever I restart the docker container, the above code took 13min. After that, it takes only a few seconds to run.

Again, this issue is specific to Jetson – it doesn’t happen to the desktop version of Pytorch.

Any idea why?

David