PyTorch for Jetson

When I try to verify my pytorch installation, it gives me
"python3 -c “import torchvision; print(torchvision.version)”
Illegal instruction (core dumped)
"

OK, that worked, thanks!

torchlight appears to be a separate package that I am not familiar with, so you would need to make sure the correct version that your project expects is installed.

Is it possible that you upgraded pip3? If so, it may have upgraded numpy, and there is a bug in numpy v1.19.5: Illegal instruction (core dumped) on import for numpy 1.19.5 on ARM64 · Issue #18131 · numpy/numpy · GitHub

As a workaround, export OPENBLAS_CORETYPE=ARMV8 first.

Thanks. It worked

Are you familiar with h5py? When downloading h5py it keeps giving errors:
“Using cached h5py-3.1.0.tar.gz (371 kB)
Installing build dependencies … done
Getting requirements to build wheel … done
Installing backend dependencies … error
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-tiye2d9d/normal --no-warn-script-location --no-binary :none: --only-binary :none: -i Simple index – ‘numpy==1.19.3; python_version >= “3.9”’ ‘Cython>=0.29.14; python_version >= “3.8”’ ‘numpy==1.12; python_version == “3.6”’ ‘numpy==1.17.5; python_version == “3.8”’ ‘Cython>=0.29; python_version < “3.8”’ pkgconfig ‘numpy==1.14.5; python_version == “3.7”’
cwd: None
Complete output (719 lines):
Ignoring numpy: markers ‘python_version >= “3.9”’ don’t match your environment
Ignoring numpy: markers ‘python_version == “3.6”’ don’t match your environment
Ignoring Cython: markers ‘python_version < “3.8”’ don’t match your environment
Ignoring numpy: markers ‘python_version == “3.7”’ don’t match your environment
Collecting Cython>=0.29.14
Using cached Cython-0.29.22-py2.py3-none-any.whl (980 kB)
Collecting numpy==1.17.5
Using cached numpy-1.17.5.zip (6.4 MB)
Collecting pkgconfig
Using cached pkgconfig-1.5.2-py2.py3-none-any.whl (6.4 kB)
Building wheels for collected packages: numpy
Building wheel for numpy (setup.py): started
Building wheel for numpy (setup.py): finished with status ‘error’
ERROR: Command errored out with exit status 1:
command: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '”‘"’/tmp/pip-install-5uzaorju/numpy_24ecee83328646b09eb3ad01f0d03151/setup.py’“'”‘; file=’“'”‘/tmp/pip-install-5uzaorju/numpy_24ecee83328646b09eb3ad01f0d03151/setup.py’“'”‘;f=getattr(tokenize, ‘"’“‘open’”’“‘, open)(file);code=f.read().replace(’”‘"’\r\n’“'”‘, ‘"’"’\n’“'”‘);f.close();exec(compile(code, file, ‘"’“‘exec’”’"‘))’ bdist_wheel -d /tmp/pip-wheel-v0_2az6f
cwd: /tmp/pip-install-5uzaorju/numpy_24ecee83328646b09eb3ad01f0d03151/
Complete output (341 lines):
Running from numpy source directory.
blas_opt_info:
blas_mkl_info:
customize UnixCCompiler
libraries mkl_rt not found in [‘/usr/local/lib’, ‘/usr/lib’, ‘/usr/lib/aarch64-linux-gnu’]
NOT AVAILABLE
"

Sorry, I am not familiar with h5py, although I think you can install it with sudo apt-get install python3-h5py

You may also want to try this suggestion to install BLAS/LAPACK first:

sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran

If that still doesn’t fix it, you may want to post a new topic about it. Thanks.

Hi all, hoping you can help me. I am trying to get torch and torchvision installed on my new Jetson Nano with Python version 3.6.7. I successfully installed torch and can import torch with no errors, but I am stuck on installing torchvision. When I try python setup.py install --user I receive the following error:

Building wheel torchvision-0.7.0a0+78ed10c
running install
running bdist_egg
running egg_info
writing torchvision.egg-info/PKG-INFO
writing dependency_links to torchvision.egg-info/dependency_links.txt
writing requirements to torchvision.egg-info/requires.txt
writing top-level names to torchvision.egg-info/top_level.txt
/home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/utils/cpp_extension.py:335: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'torchvision.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
writing manifest file 'torchvision.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-aarch64/egg
running install_lib
running build_py
copying torchvision/version.py -> build/lib.linux-aarch64-3.6/torchvision
running build_ext
building 'torchvision.video_reader' extension
/home/bricklayer/archiconda3/envs/aislebrain/bin/aarch64-conda_cos7-linux-gnu-cc -DNDEBUG -fwrapv -O3 -Wall -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -DNDEBUG -D_FORTIFY_SOURCE=2 -O3 -fPIC -I/home/bricklayer/torchvision/torchvision/csrc/cpu/decoder -I/home/bricklayer/torchvision/torchvision/csrc/cpu/video_reader -I/usr/include -I/home/bricklayer/torchvision/torchvision/csrc -I/home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include -I/home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/TH -I/home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/THC -I/home/bricklayer/archiconda3/envs/aislebrain/include/python3.6m -c /home/bricklayer/torchvision/torchvision/csrc/cpu/video_reader/VideoReader.cpp -o build/temp.linux-aarch64-3.6/home/bricklayer/torchvision/torchvision/csrc/cpu/video_reader/VideoReader.o -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=video_reader -D_GLIBCXX_USE_CXX11_ABI=1
In file included from /home/bricklayer/archiconda3/envs/aislebrain/aarch64-conda_cos7-linux-gnu/include/c++/7.3.0/cwchar:44:0,
                 from /home/bricklayer/archiconda3/envs/aislebrain/aarch64-conda_cos7-linux-gnu/include/c++/7.3.0/bits/postypes.h:40,
                 from /home/bricklayer/archiconda3/envs/aislebrain/aarch64-conda_cos7-linux-gnu/include/c++/7.3.0/iosfwd:40,
                 from /home/bricklayer/archiconda3/envs/aislebrain/aarch64-conda_cos7-linux-gnu/include/c++/7.3.0/memory:72,
                 from /home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:4,
                 from /home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/bricklayer/archiconda3/envs/aislebrain/lib/python3.6/site-packages/torch/include/torch/script.h:3,
                 from /home/bricklayer/torchvision/torchvision/csrc/cpu/video_reader/VideoReader.h:3,
                 from /home/bricklayer/torchvision/torchvision/csrc/cpu/video_reader/VideoReader.cpp:1:
/usr/include/wchar.h:27:10: fatal error: bits/libc-header-start.h: No such file or directory
 #include <bits/libc-header-start.h>
          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/home/bricklayer/archiconda3/envs/aislebrain/bin/aarch64-conda_cos7-linux-gnu-cc' failed with exit status 1

I did some searching and tried to fix it using sudo apt-get install gcc-multilib g++-multilib but no luck. It responded with:

Package gcc-multilib is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'gcc-multilib' has no installation candidate
E: Unable to locate package g++-multilib
E: Couldn't find any package by regex 'g++-multilib'

Any ideas how to fix this?

Hi @jetson_mason, are you inside an archiconda environment? I see some things about that in your errors logs - I haven’t tried it with conda before. If so, can you build it outside of conda environment with python3?

Hi @dusty_nv, good catch. That did the trick. Installed python3.6 instead and tried again. It built successfully and I can run import torchvision without any errors now. Thanks!

I was build a file wheel for PyTorch 1.7, Python 3.8 by Jetson Nano.
@dusty_nv Please verify and confirm for otherspeople.

Download at: torch-1.7.0a0-cp38-cp38-linux_aarch64.whl - Google Drive

2 Likes

I’m trying to compile PyTorch 1.8 (rc4) or 1.9 (dev), and every time I try to compile either version, GCC crashes when building torch_cuda_generated_BinaryMulDivKernel.cu.o. I tried GCC 7 and GCC 8. Full command that causes crash (if I run it manually, the crash happens too so it is reproducible).

cd /home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda && /usr/bin/cmake -E make_directory /home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_BinaryMulDivKernel.cu.o -D generated_cubin_file:STRING=/home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_BinaryMulDivKernel.cu.o.cubin.txt -P /home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake /home/lissanro/Documents/pkgs/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_BinaryMulDivKernel.cu.o

I have exported variables recommended in the original post, and I also have tried some other environment variables, outcome is always the same. Even with BUILD_CAFFE2_OPS=0 BUILD_CAFFE2=0 the crash still happens. I did not try to disable CUDA because I need it.

The error in the terminal is very long, so I quote here only the end:

31036: #pragma GCC diagnostic pop
31036: # 2 "tmpxft_00007761_00000000-5_BinaryMulDivKernel.cudafe1.stub.c" 2
31036: # 1 "tmpxft_00007761_00000000-5_BinaryMulDivKernel.cudafe1.stub.c"
=== END GCC DUMP ===
CMake Error at torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake:281 (message):
  Error generating file

The beginning of the crash log (full log):

ProblemType: Crash
Date: Sun Feb 28 10:45:00 2021
ExecutablePath: /usr/lib/gcc/aarch64-linux-gnu/7/cc1plus
PreprocessedSource:
 // Target: aarch64-linux-gnu
 // Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
 // Thread model: posix
 // gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 
 // 
 // /usr/include/c++/7/cmath: In static member function ‘static scalar_t at::native::div_floor_kernel_cuda(at::TensorIterator&)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t)>::_FUN(scalar_t, scalar_t)’:

Any ideas how to solve this or what else to try?

Hi @Lissanro, I haven’t tried to build these yet - is it perhaps related to this PyTorch PR? https://github.com/pytorch/pytorch/pull/51834#discussion_r572391220

If not, can you file an issue about it on PyTorch GitHub and link to it here?

I will try the PR you linked tomorrow. Building PyTorch is very slow and takes whole day so it will take a while before I can confirm if it helped.

In the meantime I have found the following workaround. First, I use git clean -fdx to get rid of any old build files. Then I start building the wheel:

MAX_JOBS=4 BUILD_TESTS=0 TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2" USE_NCCL=0 USE_QNNPACK=0 USE_DISTRIBUTED=0 USE_PYTORCH_QNNPACK=0 USE_OPENCV=1 USE_FFMPEG=1 USE_LMDB=1 python3 setup.py bdist_wheel

As soon as CMake finishes initial configuration and prints “Build files have been written”, I apply the following patch:

--- build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake.orig 2021-03-01 07:41:52.859595866 +0000
+++ build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake      2021-03-01 07:44:29.482522544 +0000
@@ -114,10 +114,8 @@

 # Take the compiler flags and package them up to be sent to the compiler via -Xcompiler
 set(nvcc_host_compiler_flags "")
-# If we weren't given a build_configuration, use Debug.
-if(NOT build_configuration)
-  set(build_configuration Debug)
-endif()
+# Force Debug build_configuration to workaround the bug in GCC 7 and GCC 8 compiliers (https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-7-0-now-available/72048/712)
+set(build_configuration Debug)
 string(TOUPPER "${build_configuration}" build_configuration)
 #message("CUDA_NVCC_HOST_COMPILER_FLAGS = ${CUDA_NVCC_HOST_COMPILER_FLAGS}")
 foreach(flag ${CMAKE_HOST_FLAGS} ${CMAKE_HOST_FLAGS_${build_configuration}})
--- build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_CopysignKernel.cu.o.Release.cmake.orig     2021-03-01 07:41:52.880596399 +0000
+++ build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_CopysignKernel.cu.o.Release.cmake  2021-03-01 11:50:47.040069695 +0000
@@ -114,10 +114,8 @@
 
 # Take the compiler flags and package them up to be sent to the compiler via -Xcompiler
 set(nvcc_host_compiler_flags "")
-# If we weren't given a build_configuration, use Debug.
-if(NOT build_configuration)
-  set(build_configuration Debug)
-endif()
+# Force Debug build_configuration to workaround the bug in GCC 7 and GCC 8 compiliers (https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-7-0-now-available/72048/712)
+set(build_configuration Debug)
 string(TOUPPER "${build_configuration}" build_configuration)
 #message("CUDA_NVCC_HOST_COMPILER_FLAGS = ${CUDA_NVCC_HOST_COMPILER_FLAGS}")
 foreach(flag ${CMAKE_HOST_FLAGS} ${CMAKE_HOST_FLAGS_${build_configuration}}

It forces to use Debug build configuration for torch_cuda_generated_BinaryMulDivKernel.cu.o and torch_cuda_generated_CopysignKernel.cu.o (each of them causes compiler crash if build configuration is set to Release). This way I was able to build PyTorch 1.9 (for 1.8 the workaround should work too).

I expect the patch from the PR you mentioned will at least solve the issue with CopysignKernel and allow to compile it in Release build configuration. Not sure yet if it will help with BinaryMulDivKernel issue. I will report back as soon as I know if the PR solved the problem fully or partially.

Hey! I’m trying to get pytorch installed on my Jetson TX2 but I failed. My python version is 3.6.13,Jetpack version 4.4.1[L4T 32.4.4]. And I have successfully installed pip, libopenblas-base, libopenmpi-dev and Cython. When I try to pip install torch1.7.0xxx.whl:

ERROR: torch has an invalid wheel, could not read ‘torch-1.7.0.dist-info/WHEEL’ file: BadZipFile(‘Bad magic number for file header’,)

Then I try to install torch1.6.0\1.5.0\1.4.0:

ERROR: Exception:
Traceback (most recent call last):
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/cli/base_command.py”, line 189, in _main
status = self.run(options, args)
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/cli/req_command.py”, line 178, in wrapper
return func(self, options, args)
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/commands/install.py”, line 317, in run
reqs, check_supported_wheels=not options.target_dir
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/resolver.py”, line 101, in resolve
req, requested_extras=(),
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py”, line 306, in make_requirement_from_install_req
version=None,
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/factory.py”, line 169, in _make_candidate_from_link
name=name, version=version,
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py”, line 306, in init
version=version,
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py”, line 144, in init
self.dist = self._prepare()
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py”, line 226, in _prepare
dist = self._prepare_distribution()
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/resolution/resolvelib/candidates.py”, line 312, in _prepare_distribution
self._ireq, parallel_builds=True,
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py”, line 457, in prepare_linked_requirement
return self._prepare_linked_requirement(req, parallel_builds)
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py”, line 501, in _prepare_linked_requirement
req, self.req_tracker, self.finder, self.build_isolation,
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py”, line 67, in _get_prepared_distribution
return abstract_dist.get_pkg_resources_distribution()
File “/home/tx2/.local/lib/python3.6/site-packages/pip/_internal/distributions/wheel.py”, line 30, in get_pkg_resources_distribution
with ZipFile(self.req.local_file_path, allowZip64=True) as z:
File “/home/tx2/archiconda3/envs/py2/lib/python3.6/zipfile.py”, line 1131, in init
self._RealGetContents()
File “/home/tx2/archiconda3/envs/py2/lib/python3.6/zipfile.py”, line 1226, in _RealGetContents
raise BadZipFile(“Bad magic number for central directory”)
zipfile.BadZipFile: Bad magic number for central directory

Can someone help me out? I really appreciate it!Many thanks!

1 Like

I have tried the PR. It did not work at first. I had to replace the following in 4 places:

(__GNUC__ > 8 || (__GNUC__ == 8 && __GNUC_MINOR__ > 3))

With this:

(__GNUC__ > 8)

To make it work. I guess somebody thought the bug will be fixed in GCC higher than 8.3 but even with 8.4.0 it still crashes. Here is updated patch which can be applied to current pytorch: http://Dragon.Studio/2021/03/51834.diff. On top of this patch, the patch for issue #8103 is still necessary too.

I left a comment in pytorch PR #51834 about this to let them know that with GCC 8.4 the workaround is still necessary otherwise the compiler will crash.

error
zipfile.BadZipFile: Bad magic number for central directory

Hi @329992704, @Jackey_S, I just re-downloaded and re-installed the PyTorch 1.7 wheel, and did not get this file corruption error. Can you try downloading the wheel again? Perhaps it was a connection issue or temporarily problem with Box.com

When will 1.8 be available?

I will try to build it today and will report back.

1 Like

OK, PyTorch 1.8.0 wheel is posted here:

It needed this patch to build, which includes the fixes that @Lissanro mentioned.

1 Like