PyTorch for Jetson

Yes I have them all-- given my learning path I was trying to implement within a conda environment. I take it this is not ideal-- should I scrap conda altogether?

I haven’t built PyTorch or torchvision with conda before, so you may want to try it outside of conda.

1 Like

I’ve built the latest pytorch and torchvision versions from source using a conda environment for both python 3.7 and 3.8 recently. I follow this guide: https://qengineering.eu/install-pytorch-on-jetson-nano.html and it’s worked great for me.

Here’s a link to the wheel I built for 3.7 (I unfortunately lost the 3.8 wheel) if you want to try to pip install it.
https://drive.google.com/file/d/1Vc91fabuEU291gDLphvHNBSYgPMVYucO/view?usp=sharing
Just make sure you have all the dependencies listed in the qengineering.eu link above

2 Likes

Thanks @Shane_md ! That’s great–
Just wondering, how much memory do you have? With nominal work I maxed out my 32gb card.

Yeah that unfortunately is not surprising.

Personally, I’d say 64G is the minimum, and 128G+ is optimal

Hi, I’m trying to build PyTorch 1.7 in python 3.7 from source. But I run into some issues now. Can you give me some suggestions? Here is the link

Hi @bangwhe, please see my reply to your topic here: https://forums.developer.nvidia.com/t/compiled-pytorch-error-in-jetson-xavier-agx/178664/5?u=dusty_nv

Hi, when I run from my Jetson Nano:
python3 setup.py install --user

I got:
Illegal instruction (core dumped)

What could be worng here?

Hi @rafael3, try to run export OPENBLAS_CORETYPE=ARMV8 first. This could be due to a bug in numpy, see below:

Thanks @dusty_nv . I will try doing that!

Hi all,

I have a jetson nano with a up to date os. I installed pytorch 1.8.0 / torchvision 0.9.0 according to the instructions (I had to install scikit-build at some point). However, after the install I experience very poor performance. A simple “import torch” takes multiple seconds. Executing dbolya/yolact (github) eval.py for a single image takes > 5min.

Has anyone else experienced such problems? Any workarounds?

Kind regards,
Robin

On my Nano, I just tested that import torch takes 3 seconds. It loads quite a bit of modules and shared libraries when you import torch, and is doing so from SD card.

Does this project use PyTorch, or is your overall system slower when PyTorch is not even being used?

If you keep an eye on your system memory with tegrastats or jtop when you run this project, is it using a lot of memory and/or swap memory? If you process multiple images in the same script, does the second one take as long? Is the script using GPU through PyTorch or CPU only?

Will run these analytic tools today. However, I also experience complete os freezes (can’t click anything for multiple seconds (~10sec). In my opinion the detection / segmentation model that I am loading is not very complex and I am using it only on two images. Is this still to be expected with this type of machine? Do such effects also occur when using similar models such as mrcnn? Would downgrading pytorch be an option?

Typically this is indicative that the board is low on memory and may be swapping out to disk. Keep an eye on the memory and try reducing your base memory load with these suggestions:

Hi All,

I was experiencing massive memory usage on my Jetson Nano when running Pytorch 1.8 (installed from the first page of this forum), and I was wondering if anyone had any similar issues, or if there was a way around it?

I have a small script that I was profiling with memory_profiler, it looks like this:

import torch
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('size', type=int)
parser.add_argument('--cpu', action='store_true')
args = parser.parse_args()

@profile
def f():
    torch.set_grad_enabled(False)
    torch.cuda._lazy_init()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    if args.cpu:
        device = 'cpu'
    model = torch.nn.Conv2d(1, 1, 1).to(device)
    x = torch.rand(1, 1, args.size, args.size).to(device)
    y = model(x)

if __name__ == '__main__':
    f()

When running the profiler without the cpu flag (ie, using CUDA) with a size of 100, I get:

python3 -m memory_profiler torchmemscript.py 100
Filename: torchmemscript.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     9  150.906 MiB  150.906 MiB           1   @profile
    10                                         def f():
    11  150.906 MiB    0.000 MiB           1       torch.set_grad_enabled(False)
    12  155.336 MiB    4.430 MiB           1       torch.cuda._lazy_init()
    13  155.336 MiB    0.000 MiB           1       device = 'cuda' if torch.cuda.is_available() else 'cpu'
    14  155.336 MiB    0.000 MiB           1       if args.cpu:
    15                                                 device = 'cpu'
    16 1889.699 MiB 1734.363 MiB           1       model = torch.nn.Conv2d(1, 1, 1).to(device)
    17 1890.414 MiB    0.715 MiB           1       x = torch.rand(1, 1, args.size, args.size).to(device)
    18 2634.496 MiB  744.082 MiB           1       y = model(x)

But when running with the cpu flag I get:

python3 -m memory_profiler torchmemscript.py 100 --cpu
Filename: torchmemscript.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     9  151.055 MiB  151.055 MiB           1   @profile
    10                                         def f():
    11  151.055 MiB    0.000 MiB           1       torch.set_grad_enabled(False)
    12  155.359 MiB    4.305 MiB           1       torch.cuda._lazy_init()
    13  155.359 MiB    0.000 MiB           1       device = 'cuda' if torch.cuda.is_available() else 'cpu'
    14  155.359 MiB    0.000 MiB           1       if args.cpu:
    15  155.359 MiB    0.000 MiB           1           device = 'cpu'
    16  157.754 MiB    2.395 MiB           1       model = torch.nn.Conv2d(1, 1, 1).to(device)
    17  157.754 MiB    0.000 MiB           1       x = torch.rand(1, 1, args.size, args.size).to(device)
    18  160.051 MiB    2.297 MiB           1       y = model(x)

Does anyone know why there is such a huge memory usage? It’s killing my torch apps, which otherwise don’t use much RAM at all

Hi @an_actual_toaster, this doesn’t seem specific to Jetson, as using CUDA in PyTorch also uses extra memory on PC/x86. I believe it is loading compiled CUDA kernel code binaries and libraries like cuDNN. If you have swap mounted, it seems that much of it can be swapped out in my experience.

Hi @dusty_nv, thanks for your response!

Great point!
In my googling I’ve found that I can reduce RAM usage by compiling pytorch without several of the built-in cuda kernels, so I will try doing this.

If you find out how to disable some of the built-in kernels in PyTorch, this would be good to know. Let us know how it goes!

hi, sorry I am new and and do not understand how I can intall pytorch with this:

Code : ribc

MD5 (torch-1.1.0a0+b457266-cp36-cp36m-linux_aarch64.whl) = a08c545c05651e6a9651010c13f3151f

Can you help me please ?

Hi @Victorine, I’m not sure what your baidu link points to, but you can download a wheel from the first post in this topic and install it with the instructions from the first post. You should pick a wheel that supports the version of JetPack you are running.