PyTorch for Jetson

Hello,

I installed torch using the specified commands,

and after this i cloned

git clone --branch v0.8.1 https://github.com/pytorch/vision torchvision

before compiling used
export BUILD_VERSION=0.8.1

but i get torchvision v0.8.0a0+45f960c
I need v0.8.1 :( :( :( any help ?

Hi,

I have a TX-2 with Jetpack 4.2.2 and am trying to use pytorch.multiprocessing to load my models for inference once, and then have several sub-processes running that use the model to run inference.

I have used the steps here to build my own PyTorch 1.7.0 succesfully, but when I try to use multiprocessing, I get the following error:

RuntimeError: cuda runtime error (71) : operation not supported

I have included a simplified version of my code at the end of this post.

Running the same code on one of my development-VMs (without CUDA) works fine, it seems that pytorch is unable to share the GPU-based tensors across processes???

Any idea what’s causing this? The only reference to this error I can find is with relation to Windows (and the TX2 is on ubuntu obviously).

Hope someone can point me in the right direction…

//Ton

Sample code:
from torch.multiprocessing import Process

import torch.multiprocessing as mp

import torch

myModel = NONE

def test(myModel):

  print("RUNNING THIS ONE")

  print(torch.cuda.is_available())

  

def _process():

  p = Process(target=test, args=(myModel, ))

  p.start()

  p.join(300)

if __name__ == '__main__':

    import model

    myModel = model.model()

    myModel.share_memory()

    mp.set_start_method('spawn')

    _process()

Okay, just found the answer to this myself…

For reference, the underlying issue is that, for tegra architecture, cuda does NOT support IPC. It’s only supported on Linux desktop cards, not on jetson’s.

(I traced the failing call back to: cudaIpcGetMemHandle, which clue led to the final answer).

So, for tegra, sharing models (or tensors) over multiple processes is not possible (that’s what error 71 means).

Hi @hm_habi, see my post above about this - https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-7-0-now-available/72048/655

In short, you do have torchvision v0.8.1, it is just reporting as v0.8.0a0+45f960c. If you look at the torchvision release page for v0.8.1, it has the same commit (45f960c) as reported in the version. I can’t seem to get it to print out v0.8.1 even though that is what is installed, sorry about that.

I was interested because I had the same problem.
What YOLOV5 requires is that torch==1.7.0 and torchvision==0.8.1

torchvision v0.8.0a0+45f960c can not satisfy 0.8.1.
ERROR: No matching distribution found for torchvision>=0.8.1

Therefore, I cannot run YOLOV5.

OK, I think I got to the bottom of it - instead of running sudo python3 setup.py install to build torchvision, instead run python3 setup.py install --user

export BUILD_VERSION=0.8.1
python3 setup.py install --user

This will allow setup.py to pick up the BUILD_VERSION environment var, whereas before it was not finding it because it was ran with sudo. I have updated the instructions to reflect this.

1 Like

Dear Moderator,

Thank you for the solution.
But a new error occurred.

Adding torchvision 0.8.1 to easy-install.pth file
error: [Errno 13] Permission denied: ‘/home/bhlee/.local/lib/python3.6/site-packages/easy-install.pth’

Please review it one more time.
Best regards,

BHLee

@bhlee
you may try
sudo chown -R bhlee /home/bhlee
then try again

1 Like

Dear Andrey,

Wishing you good health during the coronavirus outbreak.
I installed torchvision 0.8.1 with your help.
I think many people will be of help.
Thank you.

BHLee

1 Like

I tryed to install torchvision on my jetson nano, and it gave me an error .
torchvision_error.txt (11.1 KB)

the torch version is 1.6.0.and I try to install torchvision v0.7.0 .
How can I install torchvision correctly
These are my installation commands

sudo apt-get install libjpeg-dev zlib1g-dev
git clone --branch v0.7.0 https://github.com/pytorch/vision torchvision
cd torchvision
export BUILD_VERSION=0.7.0 
sudo python3 setup.py install

Hi @kevintgbd, I don’t see an actual error in your log - did it simply quit building without an additional message, or was it just taking a long time? If it abruptly quit, you may want to mount SWAP memory. If it was taking a long time, torchvision can take a while to compile some files.

Also you may want to see the updated install instructions for torchvision:

$ sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev
$ git clone --branch <version> https://github.com/pytorch/vision torchvision   # see below for version of torchvision to download
$ cd torchvision
$ export BUILD_VERSION=0.x.0  # where 0.x.0 is the torchvision version  
$ python3 setup.py install --user
$ cd ../  # attempting to load torchvision from build dir will result in import error

If you continue to have problems, you can try using the l4t-pytorch container which comes with PyTorch/torchvision pre-installed.

It just taking a long time, I’ll try it again.I think I wrongly thought it was a mistake

Hi,

I am trying to install requirements for YOLOv5 but YOLOv5 requires Python3.8 or later. Is there a way to install torch for Python 3.8 ?

Thanks in advance.

Hi @berkcanerbol98 , you would need to build PyTorch from source against Python 3.8. I believe there are some others on this thread who have done it, and the procedure was mostly the same as the build instructions in the first post from this topic.

The same warning happens to me and it doesn’t improve. If you solved it, can you explain your solution?

Thank you I’ll try but there is said they’re compiled for TX2, Nano and Xavier. I have Jetson TX1. Is it possible to deploy YOLOv5 on TX1 ?.

Nano is the same GPU architecture as TX1, so yes. If you export this environment variable before building PyTorch, it will work on all Jetson’s (TX1/TX2/Xavier/Nano):

$ export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"

I’m having a really weird problem.

I’ve already installed torchvision (By following your instructions).

But when I try to run the script “train_ssd.py” (To run the Re-training SSD Mobilenet tutorial) I only get:

Traceback (most recent call last):
File “train_ssd.py”, line 14, in
from vision.utils.misc import str2bool, Timer, freeze_net_layers, store_labels
ModuleNotFoundError: No module named ‘vision’


If I import the module torchvision in the python shell, it is imported with no problem, I can see its version too (0.7.0)

I don´t know if the lines:

from vision.utils.misc import str2bool, Timer, freeze_net_layers, store_labels
from vision.ssd.ssd import MatchPrior
from vision.ssd.vgg_ssd import create_vgg_ssd

are trying to import modules that are NOT part of torchvision.

Do you have some advice?

Thank you in advance!


Update:
My bad, It was my mistake. I did not download the vision module on the same directory where I have the scripts, models, data, etc.

Hi again. I am trying to python3.8 setup.py build
But it gives an error :

[ 97%] Linking CXX shared library …/…/lib/libtorch_python.so
[ 98%] Built target torch_python
Makefile:140: recipe for target ‘all’ failed
make: *** [all] Error 2
Traceback (most recent call last):
File “setup.py”, line 727, in
build_deps()
File “setup.py”, line 314, in build_deps
build_caffe2(version=version,
File “/usr/local/pytorch/tools/build_pytorch_libs.py”, line 62, in build_caffe2
cmake.build(my_env)
File “/usr/local/pytorch/tools/setup_helpers/cmake.py”, line 346, in build
self.run(build_args, my_env)
File “/usr/local/pytorch/tools/setup_helpers/cmake.py”, line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File “/usr/lib/python3.8/subprocess.py”, line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘4’]’ returned non-zero exit status 2.

I couldn’t find a way to fix this problem.

i made a fr4e3sh instal of l4t-pytorch, it launched the first time…after reboot wont launch anymore and could not ssh anymore. i consider this another bad comtainer…another unusable container… i would ask questions but really.if you cant make working containers with without issues stupied like this idoght u can help.if u want to help make containers that launch and ssh so far all the containers suck big time…waist of my energy.