PyTorch for Jetson

Call .cpu() on the tensors before they get passed to torch.solve - I think this should do it:

params = torch.solve(b.cpu(), a.cpu())[0]

It depends on the size/dimension of the tensors if there is much benefit to doing it on GPU (i.e. if they are large). It would be nice to have, but since MAGMA package isn’t in the APT repo or in pip and needs compiled from source, I can’t realistically make that a pre-requisite for every user of these wheels to install. In the case if you want MAGMA support, you probably would need to compile it from source yourself.

I tried to compile MAGMA
And its have been istalled on my
/usr/local/magma

And when running the python3 setup.py bdist_wheel
The line “Compiling without MAGMA support”
Changed to “Compiling with MAGMA” after i have beem istalled.

But when the build is done
And i running the script,
Still receiving the error that magma is missing

Could you please help me with this
Its very important to as use tensor as cuda rather then cpu.

1 Like

Can you try uninstalling PyTorch (e.g. pip uninstall torch) and re-installing the wheel that you built?

If you run this command from Python, it should show MAGMA if your currently installed build was built with MAGMA support:

import torch
print(torch.__config__.show())

Wherr is the wheel is saved, cannot find it over pytorch folder

Found it
Uninstalled torch
And rebuild with the wheel created .
Looks like it works

Thank you

OK, great - glad you were able to get it working.

Just FYI for others who may read this thread, the PyTorch wheels get built to the ‘dist’ folder under your PyTorch source tree.

Im curious,
Is the new wheel contain the OpenBlas with the Lapack or not?

The PyTorch 1.4 wheels that I provided use OpenBLAS, yes. When you built your own, if libopenblas-dev was installed beforehand, you should have seen config text near the beginning of the build about OpenBLAS being found:

-- Checking for [openblas]
--   Library openblas: /usr/lib/aarch64-linux-gnu/libopenblas.so

You should be able to test that OpenBLAS is working by running the following test:

import torch
A = torch.randn(2, 3, 1, 4, 4)
B = torch.randn(2, 3, 1, 4, 6)
X, LU = torch.solve(B, A)

If OpenBLAS wasn’t configured properly during the PyTorch build, you’ll get an error message like:

RuntimeError: solve: LAPACK library not found in compilation

Note that OpenBLAS is used for CPU-based BLAS/LAPACK operations, whereas it appears MAGMA is used for CUDA-based LAPACK.

1 Like

Dear dusty_nv:

I have solved the issue. Please see the following Issue and its related Solution as follows.

  1. Issue:

I installed PyTorch 1.3.0 and torchvision v.0.4.2 in my Jeston Nano. After running the following command:

print('cuDNN version: ’ + str(torch.backends.cudnn.version()))

it shows the error as follows.

RuntimeError: cuDNN version incompatibility: PyTorch was compiled against 7500 but linked against 7301

import torch
print('cuDNN version: ’ + str(torch.backends.cudnn.version()))
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.6/dist-packages/torch/backends/cudnn/init.py”, line 83, in version
if _libcudnn() is None:
File “/usr/local/lib/python3.6/dist-packages/torch/backends/cudnn/init.py”, line 76, in _libcudnn
‘but linked against {}’.format(compile_version, __cudnn_version))
RuntimeError: cuDNN version incompatibility: PyTorch was compiled against 7500 but linked against 7301

I have the following environments:

JETPACK=“4.2”
CUDNN=“7.3…1.28-1+cuda10.0”
Python 3.6.9

  1. Solution:

I have solved the problem since I tested the different versions.

PyTorch v1.0 - torchvision v0.2.2, with being compatible cuDNN7.3
PyTorch v1.1 - torchvision v0.3.0, with being compatible cuDNN7.5
PyTorch v1.2 - torchvision v0.4.0, with being compatible cuDNN7.5
PyTorch v1.3 - torchvision v0.4.2, with being compatible cuDNN7.5
PyTorch v1.4 - torchvision v0.5.0, not to be tested.

I have successfully installed PyTorch v1.0 - torchvision v0.2.2 which is compatible with JetPack 4.2 including cuDNN 7.3. However, there is the following message for the env of PyTorch v1.0 - torchvision v0.2.2

Note: checking out ‘ef768ad5664c91a19e9c6e602b2fcb08001c6125’.

You are in ‘detached HEAD’ state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

Anyway, I check both torch and torchvision with your testing scripts, it works well.

Cheers,

Mike

Dear dusty-nv:

I have solved the issue with the super user commands as follows, since it had a serious issue to install and upgrade “setuptools” with the common user authorization.

  1. Solution:

$ sudo su
[sudo] password for nvidia: password
root@nvidia-desktop:/home/nvidia# cd Downloads
root@nvidia-desktop:/home/nvidia/Downloads# sudo apt-get install python3-setuptools
root@nvidia-desktop:/home/nvidia/Downloads# sudo pip3 install --upgrade setuptools

  1. Issue:

The previous issue is listed as follows.

While I try to install torch-1.1.0 and torchvision v0.3.0, there is the error as follows.

ModuleNotFoundError: No module named ‘setuptools.build_meta’

After uninstalling related environment and enabled the clean environment, I tried the following methods listed in the Google Search.

pip3 install --upgrade setuptools
python3 -m pip --upgrade pip setuptools wheel

But it still has the same error. While conduce the command of pip3 install numpy has also the problem. Please see the error in detail.

nvidia@nvidia-desktop:~/Downloads$ pip3 install numpy torch-1.1.0-cp36-cp36m-linux_aarch64.whl
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy
Using cached numpy-1.18.1.zip (5.4 MB)
Installing build dependencies … done
Getting requirements to build wheel … done
ERROR: Exception:
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/cli/base_command.py”, line 186, in _main
status = self.run(options, args)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/commands/install.py”, line 331, in run
resolver.resolve(requirement_set)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/legacy_resolve.py”, line 177, in resolve
discovered_reqs.extend(self._resolve_one(requirement_set, req))
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/legacy_resolve.py”, line 333, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/legacy_resolve.py”, line 282, in _get_abstract_dist_for
abstract_dist = self.preparer.prepare_linked_requirement(req)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/operations/prepare.py”, line 516, in prepare_linked_requirement
req, self.req_tracker, self.finder, self.build_isolation,
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/operations/prepare.py”, line 95, in _get_prepared_distribution
abstract_dist.prepare_distribution_metadata(finder, build_isolation)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/distributions/sdist.py”, line 38, in prepare_distribution_metadata
self._setup_isolation(finder)
File “/usr/local/lib/python3.6/dist-packages/pip/_internal/distributions/sdist.py”, line 96, in _setup_isolation
reqs = backend.get_requires_for_build_wheel()
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/wrappers.py”, line 152, in get_requires_for_build_wheel
‘config_settings’: config_settings
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/wrappers.py”, line 255, in _call_hook
raise BackendUnavailable(data.get(‘traceback’, ‘’))
pip._vendor.pep517.wrappers.BackendUnavailable: Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py”, line 63, in _build_backend
obj = import_module(mod_path)
File “/usr/lib/python3.6/importlib/init.py”, line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 994, in _gcd_import
File “”, line 971, in _find_and_load
File “”, line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named ‘setuptools.build_meta’

Cheers,

Mike

Hi Mike, I would try the PyTorch v1.0 build. I think I built that with JetPack 4.2 since that would have been the latest version at the time.

Hmm I have not seen that error before, you might want to try uninstalling setuptools completely and re-installing it. Sorry I’m not of more help there.

Exactly, it seems that upgrading setuptools might resolve the issue.
There is a similar case defined in github issues
And here is how it was resolved:

what is the installed setuptyools returned by the execution of the command below?

pip list

while installing torchvision due to the exception below I had to execute the code below to get through

/home/nvidia/apex/torchvision/torchvision/csrc/cpu/video_reader/FfmpegHeaders.h:4:10: fatal error: libavcodec/avcodec.h: No such file or directory
 #include <libavcodec/avcodec.h>
sudo apt-get install -y libavcodec-dev
sudo apt-get install -y libavformat-dev
sudo apt-get install -y libswscale-dev

Apex pytorch extension has been installed and turned out to work
[ reference https://github.com/NVIDIA/apex/issues/718 ]

1 Like

@dusty_nv

I tried to build a docker container on my Jetson TX2. In the Dockerfile, I followed your steps to install Pytorch v1.4.0 and torchvision v0.5.0:

RUN wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl
RUN apt-get update && apt-get install python3-pip libopenblas-base
RUN apt-get update
RUN pip3 install Cython
RUN pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl

RUN apt-get install -y libjpeg-dev zlib1g-dev
RUN git clone --branch v0.5.0 https://github.com/pytorch/vision torchvision   # see below for version of torchvision to download
RUN cd torchvision
RUN python setup.py install
RUN cd ../  # attempting to load torchvision from build dir will result in import error

I always receive this error message:

python: can’t open file ‘setup.py’: [Errno 2] No such file or directory
The command ‘/bin/sh -c python setup.py install’ returned a non-zero code: 2

Do you have any ideas?

Thanks a lot in advance.

@suro

When I’m creating a Dockerfile I often first start a docker container from the base container you are starting from (FROM in the Dockerfile), and manually execute the different installation steps so I can verify they all work.

Apart from that, on first view it could be this might help:
RUN python3 ./setup.py install

Thanks for your reply, herman.jansen!

All the other installation steps worked, pytorch was installed successfully. This is the line where the build always stops:
RUN python setup.py install

Unfortunately, what you suggested (RUN python3 ./setup.py install) did not work either, but thanks for the idea anyways.

there is some explanation that might help:
https://stackoverflow.com/questions/26392227/python-setup-py-install-does-not-work-from-dockerfile-but-i-can-go-in-the-cont

Thank you very much @Andrey1984, it worked! What a great community.

I solved my issue by replacing
RUN cd torchvision
RUN python setup.py install

with

RUN cd torchvision && python setup.py install

Unfortunately, the next error did not wait for long:

Step 19/29 : RUN cd torchvision && python setup.py install
—> Running in ccb400736e13
Traceback (most recent call last):
File “setup.py”, line 14, in
import torch
ImportError: No module named torch

I think this last issue will be solved by using python3 in stead of python