PyTorch for Jetson

Hmm I’m not sure what the issue is, sorry - from what I can tell, it doesn’t appear to happen on other datatypes or CUDA tensors. You may want to check with the PyTorch folks for a more in-depth look.

The same problem here. I have tried PyTorch 1.7.0 on a L4T 32.4.3/Xavier NX box.

$ python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.empty(5, 3)
>>> print(x)
tensor([[ 3.6228e+12,  1.7796e-43,  5.0356e-07],
        [ 0.0000e+00, -1.0300e-19,  1.7796e-43],
        [ 4.6333e-07,  0.0000e+00, -8.1911e-20],
        [ 1.7796e-43,  7.1668e+11,  1.7796e-43],
        [ 3.6220e+12,  1.7796e-43,  7.1775e+11]])
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[1., 1., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 1., 1.],
        [1., 0., 0.]])
>>> x = torch.tensor([5.5, 3])
>>> print(x)
tensor([6., 3.])
>>> b = torch.randn(2).cuda()
>>> print(b)
tensor([-0.3275,  1.3559], device='cuda:0')

This problem can be solved by disabling NEON, as is done in the current master branch (line 29):

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cpu/vec256/vec256_float_neon.h

Basically, this issue comes from the fact that the current NEON code doesn’t compile correctly in GCC7. So, the master branch only adds NEON support for GCC > 8.3.

I did that, recompiled v1.7.1 and the problem went away. I suggest you do the same for the official build.

Thank you.

David

1 Like

Ah thanks for tracking that down, David. I will cherry-pick PR #47099 from PyTorch master and rebuild/repost the wheel.

I can confirm this issue is reproducible on both Jetson Nano 2GB and 4GB using official sd card image (jp441) and this prebuilt 1.7.0 pytorch. It is very hard for me to accept this level of bug from a huge enterprise like NVIDIA…

Code

import torch
torch.set_printoptions(precision=4)

x = [0.1, 0.2, 0.3]
x_t_cpu = torch.Tensor(x)
x_t_cuda = torch.Tensor(x).cuda()

print('x', x)
print('x_t_cpu', x_t_cpu)
print('x_t_cuda', x_t_cuda)

Output

x [0.1, 0.2, 0.3]
x_t_cpu tensor([0., 0., 0.])
x_t_cuda tensor([0.1000, 0.2000, 0.3000], device='cuda:0')

@mfkenson the issue has already been confirmed above and traced to PyTorch bug #47098, which is a regression in PyTorch 1.7 and newer. PyTorch is not an NVIDIA product and I personally build these wheels for the convenience of the community. Sorry for the inconvenience - I will be re-posting the patched wheel shortly.

OK, the updated PyTorch 1.7 wheel that fixes the bug above has been uploaded to here:

The patch used for this is here: PyTorch patch for building on JetPack >= 4.4 · GitHub

It appears the fix is already made in PyTorch master, so future releases after PyTorch 1.7.1 should not need this manually patched.

Thank you for the build! I thought the build was officially from nvidia. I did not mean blaming you @dusty_nv. I am sorry. In fact I do appreciate your contribution. Wishing you a very merry christmas!

No worries, it appears that testing of the ARM CPU vectorized tensor operations fell through the cracks of both the PyTorch/ATen maintainers and myself. For future releases I will be sure to test CPU ops as well. My testing to date has consisted of running a bunch of models through torchvision and making sure their inferencing accuracy is close to the published accuracy (script here) - that was using CUDA though.

Wish you and your family a wonderful holiday as well!

1 Like

Has anyone here managed to build Pytorch with MAGMA? I did that and encountered a strange performance issue. Consider the following code:

import torch

A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
B = torch.tensor([[1.0, 2.0], [3.0, 4.0]])

A = A.cuda()
B = B.cuda()

torch.solve(B, A)

When I first installed Pytorch + MAGMA, the above code took 13 minutes to finish. Throughout the time the GPU was idle and one CPU core was running at 100%. After running it once, the next time I run it, the time drops to a normal level (a few seconds).

The above issue can be reproduced in a docker container. Whenever I restart the docker container, the above code took 13min. After that, it takes only a few seconds to run.

Again, this issue is specific to Jetson – it doesn’t happen to the desktop version of Pytorch.

Any idea why?

David

And this is how I built PyTorch+MAGMA:

FROM nvcr.io/nvidia/l4t-base:r32.4.4

RUN apt-get update \
    && apt-get install -y python3-pip cmake libopenblas-dev libssl-dev git gfortran

RUN pip3 install cython scikit-build

RUN pip3 install cmake

RUN pip3 install ninja

RUN wget http://icl.utk.edu/projectsfiles/magma/downloads/magma-2.5.4.tar.gz \
    && tar xf magma-2.5.4.tar.gz \
    && cd magma-2.5.4 \
    && mkdir build \
    && cd build \
    && cmake .. \
    && make -j8 \
    && make install

RUN mkdir /pytorch \
    && cd /pytorch \
    && git clone --recurse-submodules -j8 -b v1.7.0 https://github.com/pytorch/pytorch v1.7.0 \
    && cd /pytorch/v1.7.0 \
    && wget https://gist.githubusercontent.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1/raw/9d7261584a7482e7cc0fcb08a4a232c6d023f812/pytorch-1.7-jetpack-4.4.1.patch \
    && git apply pytorch-1.7-jetpack-4.4.1.patch \
    && pip3 install -r requirements.txt

ENV USE_NCCL=0
ENV USE_DISTRIBUTED=0
ENV USE_QNNPACK=0
ENV USE_PYTORCH_QNNPACK=0
ENV TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"
ENV PYTORCH_BUILD_VERSION=1.7.0
ENV PYTORCH_BUILD_NUMBER=0

RUN cd /pytorch/v1.7.0 \
    && python3 setup.py bdist_wheel \
    && cd dist \
    && pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl

WORKDIR /pytorch/v1.7.0

CMD ["bash"]

Hi @dusty_nv, I followed your instructions to build pytorch v1.6.0 from source.
My environment:

  • Hardware Platform: Drive AGX Xavier developer kit
  • Software version: DRIVE Software 10
  • Cuda 10.2
  • Python 3.6
  • cmake 3.10.2

It’s stuck at about 90% with error below. Could you please give some advise? Thanks a lot.
Toan Le

[6/416] Linking CXX executable bin/graph_test
FAILED: bin/graph_test
: && /usr/bin/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -rdynamic caffe2/CMakeFiles/graph_test.dir/core/graph_test.cc.o -o bin/graph_test -Wl,-rpath,/home/nvidia/pytorch/build/lib:/usr/local/cuda-10.2/lib64: lib/libgtest_main.a -Wl,–no-as-needed,/home/nvidia/pytorch/build/lib/libtorch.so -Wl,–as-needed -Wl,–no-as-needed,/home/nvidia/pytorch/build/lib/libtorch_cpu.so -Wl,–as-needed lib/libprotobuf.a -Wl,–no-as-needed,/home/nvidia/pytorch/build/lib/libtorch_cuda.so -Wl,–as-needed lib/libc10_cuda.so lib/libc10.so /usr/local/cuda-10.2/lib64/libcudart.so /usr/local/cuda-10.2/lib64/libnvToolsExt.so /usr/local/cuda-10.2/lib64/libcufft.so /usr/local/cuda-10.2/lib64/libcurand.so /usr/lib/aarch64-linux-gnu/libcublas.so /usr/lib/aarch64-linux-gnu/libcudnn.so lib/libgtest.a -pthread && :
/home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseSpMM' /home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseSpMM_bufferSize’
/home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseCreateDnMat' /home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseCreateCoo’
/home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseDestroyDnMat' /home/nvidia/pytorch/build/lib/libtorch_cuda.so: undefined reference to cusparseDestroySpMat’
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File “setup.py”, line 732, in
build_deps()
File “setup.py”, line 316, in build_deps
cmake=cmake)
File “/home/nvidia/pytorch/tools/build_pytorch_libs.py”, line 62, in build_caffe2
cmake.build(my_env)
File “/home/nvidia/pytorch/tools/setup_helpers/cmake.py”, line 345, in build
self.run(build_args, my_env)
File “/home/nvidia/pytorch/tools/setup_helpers/cmake.py”, line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File “/usr/lib/python3.6/subprocess.py”, line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘cmake’, ‘–build’, ‘.’, ‘–target’, ‘install’, ‘–config’, ‘Release’, ‘–’, ‘-j’, ‘6’]’ returned non-zero exit status 1.

Hi @v.toanlm4, I haven’t used the DRIVE hardware, so I’m not sure about the error, sorry about that. You might want to post to the DRIVE forums for further assistance.

At a glance, it appears the error is related to finding the cuSPARSE library. It appears similar to this PyTorch Issue on GitHub - you might want to check your $CUDA_HOME environment variable:

Happy Holidays!
I’m validating my installation of torch and torchvision right now…
When executing “print(torchvision.version)”, the output is 0.8.0a0+45f960c. I installed PyTorch v1.7.0 and torchvision v0.8.1. I just wanted to check and see if this behavior is normal… I think it should be displaying 0.8.1?
Thanks!

1 Like

@dusty_nv Could you shared the links of PyTorch from source to me?

Hi @asoong, that is what it reports for me as well, and I too have torchvision v0.8.1 installed. I am not sure if setting the following environment variable before building torchvision helps or not:

export BUILD_VERSION=0.x.0  # where 0.x.0 is the torchvision version  
1 Like

Hi @SamuelWei, you can find the instructions for building PyTorch from source in the top post of this thread:

PyTorch for Jetson

Expand the Build from Source section and the instructions are in there.

thank you , I got it!
this post tutorials can’t solution my question of build the pytorch by python3.7

I am experiencing the following issue.
NVIDIA jetson NX
Jetpack 4.4.1
Python 3.6.9
Pytorch for Jetson 1.7.0

python

import torch
torch.exp(torch.tensor([2.], dtype=torch.float, device=‘cpu’))
tensor([2.])
torch.exp(torch.tensor([2.], dtype=torch.float, device=‘cuda’))
tensor([7.3891], devcie=‘cuda:0’)
torch.exp(torch.tensor([2.], dtype=torch.float64, device=‘cpu’))
tensor([7.3891], dtype=torch.float64 )

Do you have any suggestions?

bash: 0.8.1: No such file or directory