PyTorch for Jetson - version 1.8.0 now available

hi,I created a python 3.6 environment with miniforge-pypypy3, in which Python 3.6 - torch-1.6.0-cp36-cp36m-linux was installed_ Aarch64.whl, then encountered a problem installing pytorch v1.6 - torch vision v0.7.0,

~/torchvision$ sudo python3 setup.py install
Traceback (most recent call last):
File “setup.py”, line 13, in
import torch
ModuleNotFoundError: No module named ‘torch’

but I already have Python installed and can import it

~/torchvision$ python
Python 3.6.11 | packaged by conda-forge | (default, Nov 27 2020, 18:40:28)
[GCC 9.3.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.version)
1.6.0

How can I solve it,thanks

Hi @1754387338, I haven’t used miniforge before, but it would appear that torchvision through python3 isn’t able to find the torch package you installed. If you run python3 are you able to import torch?

If not, you might want to symbolically link python3 to python so that the torchvision install script can find it.

hi,

I tried to use:
pip3 install torch-1.4.0-cp36-cp36m-linux_aarch64.whl
And
pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl

I have downloaded the two whl files. But every time it went wrong.
The error is displayed as follows:
Defaulting to user installation because normal site-packages is not writeable
ERROR: torch-1.6.0-cp36-cp36m-linux_aarch64.whl is not a supported wheel on this platform.

In addition, I am using python3.8, because I want to deploy yolov5 on jetson nano, which requires python3.8

Why don’t these containers include OpenCV yet? Seriously what’s the point of this. Your docker system is so useless I’ve spent a week on headaches of missling libraries that should have been included by default. All these dumb methods to pipe over libraries from my main os… Does anyone who actually do ML work make these?

Hi @industrialacc0, we try to keep the size down on the l4t-pytorch and l4t-tensorflow containers by not installing extra libraries into those. You can use these base containers as a starting point and create your own containers from them (e.g. via Dockerfiles)

You can see the Dockerfile commands for installing the version of OpenCV that comes with JetPack here, so you needn’t mount it from the host if you don’t want to:

https://github.com/NVIDIA-AI-IOT/jetbot/blob/cbf6f1b5a3285ad3bbb18ec552ed79846d1e2529/docker/base/Dockerfile#L47

You can also change/rebuild these base containers yourself, as the base Dockerfiles and build scripts for the containers are open-sourced here:

https://github.com/dusty-nv/jetson-containers

And yes, I do a lot of ML work in PyTorch but have not needed OpenCV as a frequent dependency, and it is a large library. I’ll consider adding it to the larger l4t-ml container, but perhaps folks may desire their own customized/newer version of OpenCV, which having a pre-existing version could complicate the install of.

Hi @zlbzailushang, these PyTorch wheels were built for Python 3.6, so they wouldn’t work on Python 3.8. However some other users on this topic and in the forums have been able to rebuild PyTorch for Python 3.8. Please see this post for more info:

thanks a lot.

need VPN to use wget to download

Thanks I was very frustrated when I wrote this I didn’t expect a useful solution so I just built OpenCV from scratch before. Your solution will probably work a lot better. In another topic your colleague recommended to change csv files and stuff

On a sidenote, I would expect most people to use the Jetsons for Machine vision inference devices (at least I do). Having pure ML containers without the libraries most commonly used in combination with them such as seems like a bit of a waste but that’s up to you guys I guess.

In any case thank you!

@industrialacc0 thanks for your feedback and following up. In the next version of the l4t-ml container, I have added OpenCV 4.4.1 to it (the one that comes with JetPack). l4t-ml is the big container with PyTorch/TensorFlow/JupyterLab/scipy/sklearn/ect, so that can provide users a good starting point.

@dusty_nv , something seems wrong with the prebuilt PyTorch v1.7.0. To reproduce on AGX Xavier:

Let’s start with the official Docker image:
docker run --rm -it --gpus all nvcr.io/nvidia/l4t-pytorch:r32.4.4-pth1.6-py3

Now, inside the Docker container, we do:
wget https://nvidia.box.com/shared/static/wa34qwrwtk9njtyarwt5nvo6imenfy26.whl -O torch-1.7.0-cp36-cp36m-linux_aarch64.whl

pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl

python3

import torch
import torch.nn.functional as F
x = torch.tensor([12.345])
print(x)
print(F.softmax(x))

The expected results are:
12.345
1.

The actual results are:
12.
nan

The above issue only happens to CPU tensors. If we do x = x.cuda() before printing, then we’ll see the correct results. I suspect something is wrong with the CPU library.

Also, PyTorch v1.6.0 doesn’t have the above problem.

Hi @yin, it seems this issue may be related to this PyTorch 1.7.0 issue: https://github.com/pytorch/pytorch/issues/49157

Not sure if it has been addressed in 1.7.1 or not. I don’t think it is particular to how I built the wheel. For now, you may want to stick with 1.6.0 or try building the wheel for 1.7.1 to see if that fixes it (although I would expect that to be indicated in the PyTorch issue above at some point)

1 Like

Thanks for the prompt reply @dusty_nv.

I built v1.7.1 from source following your instructions, and encountered the same issue.

This issue is specific to Jetson. It doesn’t happen on the desktop version of PyTorch downloaded from PyTorch’s conda channel.

David

Hmm I’m not sure what the issue is, sorry - from what I can tell, it doesn’t appear to happen on other datatypes or CUDA tensors. You may want to check with the PyTorch folks for a more in-depth look.

The same problem here. I have tried PyTorch 1.7.0 on a L4T 32.4.3/Xavier NX box.

$ python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.empty(5, 3)
>>> print(x)
tensor([[ 3.6228e+12,  1.7796e-43,  5.0356e-07],
        [ 0.0000e+00, -1.0300e-19,  1.7796e-43],
        [ 4.6333e-07,  0.0000e+00, -8.1911e-20],
        [ 1.7796e-43,  7.1668e+11,  1.7796e-43],
        [ 3.6220e+12,  1.7796e-43,  7.1775e+11]])
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 1., 1.],
        [1., 0., 0.]])
>>> x = torch.tensor([5.5, 3])
>>> print(x)
tensor([6., 3.])
>>> b = torch.randn(2).cuda()
>>> print(b)
tensor([-0.3275,  1.3559], device='cuda:0')

This problem can be solved by disabling NEON, as is done in the current master branch (line 29):

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cpu/vec256/vec256_float_neon.h

Basically, this issue comes from the fact that the current NEON code doesn’t compile correctly in GCC7. So, the master branch only adds NEON support for GCC > 8.3.

I did that, recompiled v1.7.1 and the problem went away. I suggest you do the same for the official build.

Thank you.

David

1 Like

Ah thanks for tracking that down, David. I will cherry-pick PR #47099 from PyTorch master and rebuild/repost the wheel.

I can confirm this issue is reproducible on both Jetson Nano 2GB and 4GB using official sd card image (jp441) and this prebuilt 1.7.0 pytorch. It is very hard for me to accept this level of bug from a huge enterprise like NVIDIA…

Code

import torch
torch.set_printoptions(precision=4)

x = [0.1, 0.2, 0.3]
x_t_cpu = torch.Tensor(x)
x_t_cuda = torch.Tensor(x).cuda()

print('x', x)
print('x_t_cpu', x_t_cpu)
print('x_t_cuda', x_t_cuda)

Output

x [0.1, 0.2, 0.3]
x_t_cpu tensor([0., 0., 0.])
x_t_cuda tensor([0.1000, 0.2000, 0.3000], device='cuda:0')

@mfkenson the issue has already been confirmed above and traced to PyTorch bug #47098, which is a regression in PyTorch 1.7 and newer. PyTorch is not an NVIDIA product and I personally build these wheels for the convenience of the community. Sorry for the inconvenience - I will be re-posting the patched wheel shortly.

OK, the updated PyTorch 1.7 wheel that fixes the bug above has been uploaded to here:

The patch used for this is here: https://gist.github.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1#file-pytorch-1-7-jetpack-4-4-1-patch

It appears the fix is already made in PyTorch master, so future releases after PyTorch 1.7.1 should not need this manually patched.