PyTorch for Jetson

Hi, I’m new to Linux and AI and just bought a Nano to learn a little.
I’m trying to use a pretrained model, torchvision.models.detection.maskrcnn_resnet50_fpn, that requires torchvision >= 0.3, so I installed the 0.5 version from first page, code below, and when importing torchvision the version is still 0.2.2. So how can I use the version that I just installed? THX

$ sudo apt-get install libjpeg-dev zlib1g-dev
    $ git clone --branch <version> https://github.com/pytorch/vision torchvision   # see below for version of torchvision to download
    $ cd torchvision
    $ sudo python setup.py install
    $ cd ../  # attempting to load torchvision from build dir will result in import error

Was torchvision installed previously somehow? You might want to remove torchvision package from your system with pip, and remove the git repo you cloned, and try again. What was the ‘git clone’ command that you used?

Any plans to build for jetpack 4.3?

Hi rdejana, from my testing the same pip wheel builds work for JetPack 4.3 as well.

Thanks Dusty!

Hi,I am just using TX1 for a little while, and I have installed pytorch, I can import it in python3. Recently, I installed torchvision, it seems like I installed it successfully, but when I import it, it has some Runtime Errors, like below.

nvidia@nvidia-desktop:~$ python3
Python 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch as t
>>> import torchvision as tv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/__init__.py", line 3, in <module>
    from torchvision import models
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/__init__.py", line 12, in <module>
    from . import detection
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/__init__.py", line 1, in <module>
    from .faster_rcnn import *
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/faster_rcnn.py", line 13, in <module>
    from .rpn import AnchorGenerator, RPNHead, RegionProposalNetwork
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/rpn.py", line 11, in <module>
    from . import _utils as det_utils
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/_utils.py", line 19, in <module>
    class BalancedPositiveNegativeSampler(object):
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 1219, in script
    _compile_and_register_class(obj, _rcb, qualified_name)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 1076, in _compile_and_register_class
    _jit_script_class_compile(qualified_name, ast, rcb)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/_recursive.py", line 222, in try_compile_fn
    return torch.jit.script(fn, _rcb=rcb)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 1226, in script
    fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
RuntimeError: 
builtin cannot be used as a value:
at /usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/_utils.py:14:56
def zeros_like(tensor, dtype):
    # type: (Tensor, int) -> Tensor
    return torch.zeros_like(tensor, dtype=dtype, layout=tensor.layout,
                                                        ~~~~~~~~~~~~~ <--- HERE
                            device=tensor.device, pin_memory=tensor.is_pinned())
'zeros_like' is being compiled since it was called from '__torch__.torchvision.models.detection._utils.BalancedPositiveNegativeSampler.__call__'
at /usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0+d2c763e-py3.6-linux-aarch64.egg/torchvision/models/detection/_utils.py:72:12

            # randomly select positive and negative examples
            perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
            perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

            pos_idx_per_image = positive[perm1]
            neg_idx_per_image = negative[perm2]

            # create binary mask from indices
            pos_idx_per_image_mask = zeros_like(
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...  <--- HERE
                matched_idxs_per_image, dtype=torch.uint8
            )
            neg_idx_per_image_mask = zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )

            pos_idx_per_image_mask[pos_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
            neg_idx_per_image_mask[neg_idx_per_image] = torch.tensor(1, dtype=torch.uint8)

How can I fix it? THX!

Hi all,

I ma trying to install Pytorch v1.3.0 on Jetson nano. I was able to run the wget command. However when I run the second command:

pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl

I am facing the below error.

Command “/usr/bin/python3 -u -c “import setuptools, tokenize;file=‘/tmp/pip-build-80g4vkm7/numpy/setup.py’;f=getattr(tokenize, ‘open’, open)(file);code=f.read().replace(‘\r\n’, ‘\n’);f.close();exec(compile(code, file, ‘exec’))” install --record /tmp/pip-sqm1_slh-record/install-record.txt --single-version-externally-managed --compile --user --prefix=” failed with error code 1 in /tmp/pip-build-80g4vkm7/numpy/

Can anyone please help me on this?

Thanks in advance.

I’m getting this error when installing torchvision from source. Should libnvToolsExt.so.1 be there as I don’t see it.

appuser@afd0747ef1dc:~/vision$ python3 setup.py install
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    import torch
  File "/home/appuser/.local/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory
appuser@afd0747ef1dc:~/vision$ sudo python3 setup.py install
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    import torch
  File "/home/appuser/.local/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: libnvToolsExt.so.1: cannot open shared object file: No such file or directory```

libnvToolsExt.so should have been installed by JetPack as part of CUDA toolkit - for more info, please see this post:

https://devtalk.nvidia.com/default/topic/1067246/jetson-tx2/how-can-i-install-the-pytorch-/post/5405773/#5405773

Im getting this same error. Any ideas how to fix this?

I’m getting the same error when attempting to import torchvision as well. It looks like its a known issue and they are fixing it in PyTorch v1.4 – which doesn’t help us much.

https://github.com/pytorch/vision/issues/1675

Update:

In order to get torchvision to import with PyTorch v1.3, I ending installing the following:

torchvision 0.4.2
pillow 6.2.2

Thanks lycaass - I’ve updated the instructions in the main post to reflect this torchvision version to use for the time being.

Thanks for the answer. I wasn’t using --runtime nvidia when running the container so kept having the issue. Now with the runtime added I see libnvToolsExt.so there. Looks like I cant install torchvision during docker build time.

Yes I did the same thing to get it working. Just used Pillow 6.1. Will try with 6.2.2
pip3 install matplotlib Pillow==6.1
it clone -b “v0.4.2” GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision

I had error in torchvision

I command bellow.
$ sudo apt-get install libjpeg-dev zlib1g-dev
$ git clone --branch v0.4.0 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision torchvision # see below for version of torchvision to download
$ cd torchvision
$ sudo python setup.py install

Then I get bellow error.

Traceback (most recent call last):
File “setup.py”, line 6, in
from setuptools import setup, find_packages
ImportError: No module named setuptools

Please help me.

Jetson Nano
JetPack 4.2.2
Python 3.6.9

I think I’m on to something, but I don’t now how to solve it.

When building a docker image on the nano (based onnvcr.io/nvidia/l4t-base:r32.3.1) I also get the famous missing libnvToolsExt.so.1 error, when the build step gets to installing torchvision (python3 setup.py install). PATH and LD_LIBRARY_PATH ENV’s are set in the Dockerfile.

When I then run the last container that was running when the build crashed, and I execute the ‘python3 setup.py install’ from within that container torchvision gets build and I can import and use it in python.

I’m thinking the temporary container during the build process is not started with ‘–gpus all’, and when I run the container I do use this option, and so the temporary build container does not have access to the NVidia cuda libs, while the manually started container does.

Question now is how I can get the intermediate container during the build process also use the ‘–gpus all’ option.
Any ideas?

Found the solution for the missing libnvToolsExt.so.1 here:

https://github.com/NVIDIA/nvidia-docker/issues/1033

Don’t forget to restart the docker deamon after changing the deamon.json file (or reboot the complete device)

Hi, thanks for the wheel !

When I run pytorch on CPU, it only uses one core i.e i get 100% CPU usage instead of 400% when I run something like

import torch
a = torch.rand(100, 1000, 1000)
b = torch.rand(100, 1000, 1000)

while True:
    c = torch.bmm(a, b)

I also noticed the same behaviour with numpy.
Tensorflow uses the four cores though.

Any thoughts how to fix that?

Thanks in advance,
Manu

i have problem installing torch on jetson xavier agx
i have followed those steps:

  1. wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl

  2. pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl

  3. sudo apt-get install libjpeg-dev zlib1g-dev

  4. git clone --branch v0.4.2 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision torchvision

  5. cd torchvision

  6. sudo python setup.py install

  7. cd …/

  8. cd pytorch

  9. git clone --recursive --branch v1.3.0 GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration

  10. export USE_NCCL=0

  11. export USE_DISTRIBUTED=0

  12. export TORCH_CUDA_ARCH_LIST=“5.3;6.2;7.2”

  13. export PYTORCH_BUILD_VERSION=1.3.0

  14. export PYTORCH_BUILD_NUMBER=1

  15. sudo apt-get install python3-pip cmake

  16. sudo pip3 install -U setuptools

  17. sudo pip3 install -r requirements.txt

  18. pip3 install scikit-build --user

  19. pip3 install ninja --user

  20. python3 setup.py bdist_wheel

after hours of compilation
i wanted to test some function of torch
so created 2 random tensors(A and B)
and then used ‘X,LU = torch.solve(A,B)’
but i receive an error about missing LAPACK

‘RunTimeError: .solve LAPACK library not found in compilation’

what am i doing wrong???

What happened after you installed torch-1.3.0-cp36-cp36m-linux_aarch64.whl with pip3? Did that run ok?

From this related issue, it would seem that you need to install OpenBLAS package or similar: https://github.com/torch/torch7/issues/174