PyTorch for Jetson

git checkout version?
as in

git clone https://github.com/pytorch/pytorch.git
cd pytorch
git fetch --tags
git checkout v1.4.0
git submodule update --init --recursive

Hi guys,
i’m trying to build pytorch from source with version 1.5.0
the problem is that i want to enable QNNPACK with my build.
so i just set USE_QNNPACK as 1 and USE_PYTORCH_QNNPACK as 1 also.

but my build keeps failing

FAILED: confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o

Can you give me some advise?

if there exist old question of this, im sorry about asking again

ps) does anybody knows why qnnpack engine is able with pytorch version 1.4.0 but not on upper versions?

1 Like

Hi @fredrickangws, I’m not sure how to fix the QNNPACK build error in PyTorch >= 1.5 on aarch64, so I have disabled it. I recommend posting an issue about it to the PyTorch GitHub if you require QNNPACK.

1 Like

Is it possible to get a torch build with USE_NNPACK as 1? I cant seem to be able to build even the simplest plain torch. Do you have a build container/docker that you can share ?

Hello guys!

I am trying to install the torchvision 0.7.0.
After following the official instructions, I get torch import error, although torch is successfully installed. But generally, I can run torch only from the python terminal. Do you know why?

(env) **stas@ai** : **~** $ cd torchvision

(env) **stas@ai** : **~/torchvision** $ sudo python3 setup.py install

Traceback (most recent call last):

File "setup.py", line 13, in <module>

import torch

ModuleNotFoundError: No module named 'torch'

Thank you

Hi @dusty_nv, first of all thanks for replying
as you suggest, i’ve done some digging on pytorch github
and i found this

https://github.com/pytorch/pytorch/issues/33124#issuecomment-602048845

i apply this patch as manually and it worked

for sure , my device is jetson xavier and i install pytorch with version 1.5.0 from the source

thanks again

Install pytorch from the instructions above.

I recall getting build errors when USE_NNPACK was enabled (similar to USE_QNNPACK), so I disabled it. There may be some fixes required in the upstream PyTorch if you care to file an issue on their GitHub about it.

@dusty_nv Thanks for the response. I was able to build the NNPACK package on the Xvier without issue, that’s why I asked. I had seen somewhere the option to build PyTorch with USE_SYSTEM_NNPACK or something like that and was hoping this may be an option too, but with building from source failing for me, I’m rather stuck.

What is the build error that you are getting?

@dusty_nv It always seeme to fail here:

[2170/4221] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o
/usr/bin/c++  -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /Firecuda/espnet/tools/venv/lib/python3.6/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I/usr/local/cuda/include -I../torch/csrc/api -I../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../torch/csrc -I../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/NNPACK/include -I../third_party/FP16/include -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/fmt/include -I/usr/local/cuda-10.2//include -I/usr/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/GridSampler.cpp.o -c ../aten/src/ATen/native/GridSampler.cpp
In file included from ../aten/src/ATen/cpu/vec256/vec256.h:10:0,
                 from ../aten/src/ATen/cpu/vec256/functional.h:6,
                 from ../aten/src/ATen/cpu/vml.h:5,
                 from ../aten/src/ATen/native/GridSampler.cpp:7:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:262:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) const {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:267:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In static member function ‘static at::vec256::{anonymous}::Vec256<float> at::vec256::{anonymous}::Vec256<float>::loadu(const void*, int64_t)’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In member function ‘void at::vec256::{anonymous}::Vec256<float>::store(void*, int64_t) const’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
       vst1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
       vst1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In function ‘at::vec256::{anonymous}::Vec256<T> at::vec256::{anonymous}::operator&(const at::vec256::{anonymous}::Vec256<T>&, const at::vec256::{anonymous}::Vec256<T>&) [with T = float; typename std::enable_if<(! std::is_base_of<at::vec256::{anonymous}::Vec256i, at::vec256::{anonymous}::Vec256<T> >::value), int>::type <anonymous> = 0]’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:577:42: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_low())));
                                          ^
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:580:43: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_high())));
                                           ^
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In function ‘at::vec256::{anonymous}::Vec256<T> at::vec256::{anonymous}::operator|(const at::vec256::{anonymous}::Vec256<T>&, const at::vec256::{anonymous}::Vec256<T>&) [with T = float; typename std::enable_if<(! std::is_base_of<at::vec256::{anonymous}::Vec256i, at::vec256::{anonymous}::Vec256<T> >::value), int>::type <anonymous> = 0]’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:588:42: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_low())));
                                          ^
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:591:43: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_high())));
                                           ^
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In function ‘at::vec256::{anonymous}::Vec256<T> at::vec256::{anonymous}::operator^(const at::vec256::{anonymous}::Vec256<T>&, const at::vec256::{anonymous}::Vec256<T>&) [with T = float; typename std::enable_if<(! std::is_base_of<at::vec256::{anonymous}::Vec256i, at::vec256::{anonymous}::Vec256<T> >::value), int>::type <anonymous> = 0]’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:599:42: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_low())));
                                          ^
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:602:43: error: cannot convert ‘uint32x4_t {aka __vector(4) unsigned int}’ to ‘float32x4_t {aka __vector(4) float}’ for argument ‘1’ to ‘uint32x4_t vreinterpretq_u32_f32(float32x4_t)’
       vreinterpretq_u32_f32(b.get_high())));
                                           ^
[2177/4221] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/NaiveConvolutionTranspose2d.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 737, in <module>
    build_deps()
  File "setup.py", line 321, in build_deps
    cmake=cmake)
  File "/Firecuda/espnet/tools/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/Firecuda/espnet/tools/pytorch/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/Firecuda/espnet/tools/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 1.

Hmm I don’t believe I’ve seen that error before - what version of PyTorch are you compiling?

Are you setting the following environment variables in your terminal before you build?

$ export USE_NCCL=0
$ export USE_DISTRIBUTED=0                # skip setting this if you want to enable OpenMPI backend
$ export USE_QNNPACK=0
$ export USE_PYTORCH_QNNPACK=0
$ export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"

Used the exact same instructions from here, I could be missing something on the git clone --branch maybe ?

The instructions from here leave it open-ended as to which PyTorch version you want to build - what is the git clone command that you use?

I used git clone --recursive GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration

I see - that means that this is a problem with PyTorch master (which is typically in flux). Instead, please clone the PyTorch version 1.6.0 release:

git clone --recursive --branch v1.6.0 http://github.com/pytorch/pytorch.git
1 Like

Thanks, I was missing the v, but just figured it out.

@fredrickangws @dusty_nv I’m glad to report building Pytorch with USE_NNPACK=1 completed successfully. Here is the file if you want to test if it will work for you MD5 (torch-1.6.0-cp36-cp36m-linux_aarch64.whl) = 6ebf1d5501170cb008b1191678b57be0 - File on MEGA

Hello Dusty and Andrey,

Thank you for your useful advice.
I test this in Nvidia Jetson TX2 8GB JetPack 4.4 and Nvidia Jetson TX2 4GB R32 rev2.1 (which I believe is JetPack 4.0) and built with TX2 4GB. I checked the logs of my installation and even though I was setting environmental variables for the installation script “setup.py” in the terminal, the script did not use them, so this is not a hardware dependent but software related in Pytorch v1.6.0-rc7 b31f58.

These environmental variables were

$ export USE_NCCL=0
$ export USE_QNNPACK=0
$ export USE_PYTORCH_QNNPACK=0
$ export TORCH_CUDA_ARCH_LIST=“5.3;6.2;7.2”

$ export PYTORCH_BUILD_VERSION= # without the leading ‘v’, e.g. 1.3.0 for PyTorch v1.3.0
$ export PYTORCH_BUILD_NUMBER=1

Consequently, an option was to modify the “CMakeListst.txt” in the Pytorch root folder.

---CMakeLists.txt
+++CMakeLists.txt
Line 159
option(USE_METAL "Use Metal for iOS build" **ON** )
option(USE_NATIVE_ARCH "Use -march=native" **OFF** )
cmake_dependent_option(
-        USE_NCCL "Use NCCL" ON
+       USE_NCCL "Use NCCL" OFF
     "USE_CUDA OR USE_ROCM;UNIX;NOT APPLE" OFF)
...
Line 182
option(USE_OPENMP "Use OpenMP for parallel code" ON)
option(USE_PROF "Use profiling" OFF)
-option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" ON)
+option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" OFF)
-option(USE_PYTORCH_QNNPACK "Use ATen/QNNPACK (quantized 8-bit operators)" ON)
+option(USE_PYTORCH_QNNPACK "Use ATen/QNNPACK (quantized 8-bit operators)" OFF)

After that I built in an external disk with

$ sudo nvpmodel -m 4
$ sudo python3 setup.py build

The log summary was pretty similar as the one Dusty provided

-- ******** Summary ********
-- General:
--   CMake version         : 3.14.7
--   CMake command         : /usr/local/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.4.0
--   BLAS                  : MKL
--   CXX flags             :  -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /usr/lib/python3/dist-packages;/usr/local/cuda
--   CMAKE_INSTALL_PREFIX  : /usr/local
-- 
--   TORCH_VERSION         : 1.6.0
--   CAFFE2_VERSION        : 1.6.0
--   BUILD_CAFFE2_MOBILE   : OFF
--   USE_STATIC_DISPATCH   : OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : OFF
--     Protobuf compiler   : /usr/local/bin/protoc
--     Protobuf includes   : /usr/local/include
--     Protobuf libraries  : /usr/local/lib/libprotobuf.so;-pthread
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : False
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_SHARED_LIBS     : ON
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_ASAN              : OFF
--   USE_CUDA              : ON
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 10.0
--     cuDNN version       : 7.5.0
--     CUDA root directory : /usr/local/cuda
--     CUDA library        : /usr/local/cuda/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda/lib64/libcudart.so
--     cublas library      : /usr/local/cuda/lib64/libcublas.so
--     cufft library       : /usr/local/cuda/lib64/libcufft.so
--     curand library      : /usr/local/cuda/lib64/libcurand.so
--     cuDNN library       : /usr/lib/aarch64-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda/include
--     NVCC executable     : /usr/local/cuda/bin/nvcc
--     NVCC flags          : -DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_62,code=sm_62;-Xcudafe;--diag_suppress=cc_clobber_ignored;-Xcudafe;--diag_suppress=integer_sign_change;-Xcudafe;--diag_suppress=useless_using_declaration;-Xcudafe;--diag_suppress=set_but_not_used;-Xcudafe;--diag_suppress=field_without_dll_interface;-Xcudafe;--diag_suppress=base_class_has_different_dll_interface;-Xcudafe;--diag_suppress=dll_interface_conflict_none_assumed;-Xcudafe;--diag_suppress=dll_interface_conflict_dllexport_assumed;-Xcudafe;--diag_suppress=implicit_return_from_non_void_function;-Xcudafe;--diag_suppress=unsigned_compare_with_zero;-Xcudafe;--diag_suppress=declared_but_not_referenced;-Xcudafe;--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-gencode;arch=compute_62,code=sm_62;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : True
--   USE_OBSERVERS         : OFF
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : OFF
--   USE_PYTORCH_QNNPACK   : OFF
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   Public Dependencies  : Threads::Threads
--   Private Dependencies : pthreadpool;cpuinfo;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
-- Configuring done
-- Generating done

I will build with the Jetson TX2 8GB JetPack 4.4 and update you. Thank very much for your help Dusty and Andrey!

My instalation of torchvision gets stucked at step 3 / 13.

  • Python 3.6
  • Torch 1.6.
  • TorchVision 0.7

Any suggestion?

[3/13] c++ -MMD -MF /home/izertis/torchvision/build/temp.linux-aarch64-3.6/home/izertis/torchvision/torchvision/csrc/cpu/PSROIAlign_cpu.o.d -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA -I/home/izertis/torchvision/torchvision/csrc -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/TH -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/izertis/.virtualenvs/py3torch/include -I/usr/include/python3.6m -c -c /home/izertis/torchvision/torchvision/csrc/cpu/PSROIAlign_cpu.cpp -o /home/izertis/torchvision/build/temp.linux-aarch64-3.6/home/izertis/torchvision/torchvision/csrc/cpu/PSROIAlign_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
In file included from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/Parallel.h:149:0,
                 from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/TH/THTensorApply.h:4,
                 from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/TH/THTensor.h:5,
                 from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/TH/TH.h:13,
                 from /home/izertis/torchvision/torchvision/csrc/cpu/PSROIAlign_cpu.cpp:3:
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/ParallelOpenMP.h:84:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)