PyTorch for Jetson

@user33662 hmm I don’t see what the actual error is from your log (only that there was some error)

I’m also not very familiar with develop mode, so not sure what the issue is - sorry about that. Was there more detailed error messages available?

Hi @dusty_nv,
Thanks a lot for replying. There is another error message before it but I am not sure if it helps:

[1136/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorRandom.cu.o
[1137/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o
FAILED: caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o
cd /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC && /usr/bin/cmake -E make_directory /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o -D generated_cubin_file:STRING=/home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o.cubin.txt -P /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o.Release.cmake
Killed
CMake Error at torch_generated_THCTensorIndex.cu.o.Release.cmake:281 (message):
  Error generating file
  /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o


[1138/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorMathPairwise.cu.o
[1139/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorMathScan.cu.o
[1140/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorScatterGather.cu.o
[1141/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorSort.cu.o
[1142/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorTopK.cu.o
ninja: build stopped: subcommand failed.
-- Building version 1.4.0
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/wanderer/Desktop/workspace/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3/dist-packages -DNUMPY_INCLUDE_DIR=/usr/lib/python3/dist-packages/numpy/core/include -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYTHON_INCLUDE_DIR=/usr/include/python3.6m -DPYTHON_LIBRARY=/usr/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.4.0 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/wanderer/Desktop/workspace/pytorch
cmake --build . --target install --config Release -- -j 6
Traceback (most recent call last):
  File "setup.py", line 755, in <module>
    build_deps()
  File "setup.py", line 316, in build_deps
    cmake=cmake)
  File "/home/wanderer/Desktop/workspace/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 335, in build
    self.run(build_args, my_env)
  File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.

Please let me know which files I can look for more errors if required.
Thank you!

OK, so Killed message typically means that your board ran out of memory while compiling. Try mounting swap memory like this:

https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

cannot install libopennmpi-dev
I have trouble with this issue while install
sudo apt-get install python3-pip libopenblas-base libopenmpi-dev

The following packages have unmet dependencies:
 libibverbs-dev : Depends: libibverbs1 (= 17.1-1) but 17.1-1ubuntu0.2 is to be installed
 libopenmpi-dev : Depends: libhwloc-dev but it is not going to be installed

Thanks a lot @dusty_nv, the error was resolved by adding a swap.
But I am still getting some errors due to cudnn and caffe2 so I flashed the Jetson TX2 and started the steps again and am again getting the same steps. Some of the errors are:

[2357/2833] Building CXX object caffe2/CMakeFiles/torch.dir/operators/conv_op_cache_cudnn.cc.o
[2358/2833] Building CXX object caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o
FAILED: caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o 
/usr/bin/c++  -DAT_PARALLEL_OPENMP=1 -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_CUDA -D_FILE_OFFSET_BITS=64 -Dtorch_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /usr/lib/python3/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /opt/rocm/hip/include -isystem /include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -isystem include -Icaffe2/aten/src/THC -I../aten/src/THC -I../aten/src/THCUNN -I../aten/src/ATen/cuda -I../c10/.. -I../third_party/NNPACK/include -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/FP16/include -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow -O3  -fPIC   -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -std=gnu++11 -MD -MT caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o -MF caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o.d -o caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o -c ../caffe2/operators/conv_op_cudnn.cc
In file included from ../caffe2/core/context_gpu.h:20:0,
                 from ../caffe2/operators/conv_op_cudnn.cc:4:
../caffe2/operators/conv_op_cudnn.cc: In member function ‘bool caffe2::CudnnConvOp::DoRunWithType()’:
../caffe2/operators/conv_op_cudnn.cc:760:11: error: ‘CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT’ was not declared in this scope
           CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT,

....

../caffe2/operators/conv_op_cudnn.cc:754:21: error: there are no arguments to ‘cudnnGetConvolutionForwardAlgorithm’ that depend on a template parameter, so a declaration of ‘cudnnGetConvolutionForwardAlgorithm’ must be available [-fpermissive]
       CUDNN_ENFORCE(cudnnGetConvolutionForwardAlgorithm(

....

../caffe2/operators/conv_op_cudnn.cc: In member function ‘bool caffe2::CudnnConvGradientOp::DoRunWithType()’:
../caffe2/operators/conv_op_cudnn.cc:1173:11: error: ‘CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT’ was not declared in this scope
           CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,

....

../caffe2/operators/conv_op_cudnn.cc:1167:21: error: there are no arguments to ‘cudnnGetConvolutionBackwardFilterAlgorithm’ that depend on a template parameter, so a declaration of ‘cudnnGetConvolutionBackwardFilterAlgorithm’ must be available [-fpermissive]
       CUDNN_ENFORCE(cudnnGetConvolutionBackwardFilterAlgorithm(

....

../caffe2/operators/conv_op_cudnn.cc:860:16:   required from here
../caffe2/operators/conv_op_cudnn.cc:754:56: error: ‘cudnnGetConvolutionForwardAlgorithm’ was not declared in this scope
       CUDNN_ENFORCE(cudnnGetConvolutionForwardAlgorithm(
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
           cudnn_wrapper_.inline_cudnn_handle(),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~         
           bottom_desc_,
           ~~~~~~~~~~~~~                                 
           filter_desc_,
           ~~~~~~~~~~~~~                                 
           conv_desc_,
           ~~~~~~~~~~~                                   
           top_desc_,
           ~~~~~~~~~~                                    
           CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           cudnn_ws_nbytes_limit_,
           ~~~~~~~~~~~~~~~~~~~~~~~                       
           &algo_));
           ~~~~~~~                                       
../caffe2/core/common_cudnn.h:71:28: note: in definition of macro ‘CUDNN_ENFORCE’
     cudnnStatus_t status = condition;

...

../caffe2/operators/conv_op_cudnn.cc: In instantiation of ‘bool caffe2::CudnnConvGradientOp::DoRunWithType() [with T_X = float; T_DY = float; T_W = float; T_B = float; T_DX = float; T_DW = float; T_DB = float]’:
../caffe2/operators/conv_op_cudnn.cc:1440:16:   required from here
../caffe2/operators/conv_op_cudnn.cc:1167:63: error: ‘cudnnGetConvolutionBackwardFilterAlgorithm’ was not declared in this scope
       CUDNN_ENFORCE(cudnnGetConvolutionBackwardFilterAlgorithm(
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
           cudnn_wrapper_.inline_cudnn_handle(),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
           bottom_desc_,
           ~~~~~~~~~~~~~                                        
           top_desc_,
           ~~~~~~~~~~                                           
           bwd_filter_conv_desc_,
           ~~~~~~~~~~~~~~~~~~~~~~                               
           filter_desc_,
           ~~~~~~~~~~~~~                                        
           CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           cudnn_ws_nbytes_limit_,
           ~~~~~~~~~~~~~~~~~~~~~~~                              
           &bwd_filter_algo_));
           ~~~~~~~~~~~~~~~~~~                                   
../caffe2/core/common_cudnn.h:71:28: note: in definition of macro ‘CUDNN_ENFORCE’
     cudnnStatus_t status = condition;                     \

I ran the command python3 setup.py develop &> build.log and got the above errors. On searching online it says that it is due to incompatibility issues between Caffe and cudnn. It says it can be resolved following the patch on Install Caffe on Jetson Nano - Q-engineering but the error is in Caffe2 here instead of Caffe. Also the caffe2 is being installed directly from pytorch source.
My system configuration is:

$ python3 jetsonInfo.py 
NVIDIA Jetson TX2
 L4T 32.4.3 [ JetPack 4.4 ]
   Ubuntu 18.04.4 LTS
   Kernel Version: 4.9.140-tegra
 CUDA 10.2.89
   CUDA Architecture: 6.2
 OpenCV version: 4.1.1
   OpenCV Cuda: NO
 CUDNN: 8.0.0.180
 TensorRT: 7.1.3.0
 Vision Works: 1.6.0.501
 VPI: 0.3.7
 Vulcan: 1.2.70

Please suggest any solutions to this. I am trying to build the vanilla pytorch 1.4.0 version on Jetson TX2. Any help is highly appreciated.
Thank you in advance!

Hi @vision-hobbist1995, this seems kind of related to your other post here:

It seems that you are having trouble installing a variety of packages. I’m not sure if you did an apt dist-upgrade or somehow the system’s package manager got into an unworking state. My recommendation would be to backup your work and re-flash your SD card (or a different SD card)

Hi @user33662, I’m not sure which version of JetPack you are on, but due to the cuDNN errors I believe that PyTorch 1.4 is incompatible (too old) for your version of JetPack/cuDNN. This is why there are only PyTorch wheels provided for JetPack 4.4 and newer that are for PyTorch >= 1.6 (because updates to PyTorch were required to support cuDNN8).

My recommendation would be to move to a newer version of PyTorch to maintain compatibility (or alternatively, use an older version of JetPack)

Need the same pythorch 1.8 with CUDA 10.2 wheel files for latest python versions (Python 3.8 or 3.9) . That’d be really helpful!

Hi @ihimu, please see my reply to your other topic here:

Or maybe someone else on this thread who built a Python 3.8 wheel can share it with you.

1 Like

Hi ken, can you share the pytorch wheel for python3.7? I’m struggling to build it from source that that is too slow…

1 Like

Thank you! The google drive link for Pytorch1.7 worked for me

If you are not specific to python 3.7 and can use python 3.8 then use the google drive link:

This is the same link that @dusky_nv had linked to in the above reply.
It installs pytorch 1.7.0 with cuda enabled on Jetson Nano.

Hi @dusty_nv,
Is there any chance you can release pip wheels built for cp-39-cp39m? I’m developing with Jetpack-4.6 and -oe4t which is currently on the honister release of yocto. That release’s meta-python layer is at python-3.9. Yocto recently made a change to its recipe syntax that will make rolling back to an earlier version of python somewhat painful.

Alternatively I am considering using the docker containers here: [NVIDIA L4T PyTorch | NVIDIA NGC]
but I am uncertain of how to do a build time deployment of that? Perhaps I could install the container on the running target then scp it over to my host and add it as a source file to a bitbake recipe??? I don’t know if that is feasible, but I may try that out if I can’t get a python 3.9 based wheel from anywhere.

Thanks
-David

Thanks

Actually I just noticed the build from scratch instructions. I will give that a try and if I have success I will post the wheel. If not I will try out the containers.

Hi,

I am a user that is running a program that requires both torch and torchvision.

So first, I tried the installation as above. Torch installed fine, and I ran the tests, and it worked. However, torchvision, after running the commands above, can be imported, but has no version.

Precisely, my steps were to navigate to project directory, then create a virtualenv, activate that virtualenv, then create a folder called req_folder, cd into req_folder, then install torchvision with exactly the instructions above. I think I tried importing like this, but it didn’t work, so I copied the torchvision folder after install into my virtualenv site-packages folder. I could then import torchvision, but there is no version method.

There is a functional issue as my program cannot run torchvision.transforms either. it appears the insall is broken. And yes, I retried with the installation instructions above, making sure the versions are right.

Also, I tried installing torch and torchvision from pypi. It appears that they have cp36, maylinux and aarch64 support. However, I could not get my cuda devices to be available. Is there a way to get those packages to run, or are they incompatible with jetson despite being aarch64?

Oh the issue appears to be solved by downloading torchvision directly into the site-packages folders, and installing without --user. Can someone explain why this is the case?

How did you solve this problem? Have you been waiting? Looking forward to your reply

On both Xavier and Nano Jetpack 4.6 I am trying to build PyTorch 1.6.0 from the source and am getting an error at the same spot starting here:

[1956/4009] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o
/usr/bin/c++  -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /home/jetson/.pyenv/versions/3.7.4/include/python3.7m -isystem /home/jetson/.virtualenvs/hetseq/lib/python3.7/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I/usr/local/cuda/include -I../torch/csrc/api -I../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../torch/csrc -I../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/NNPACK/include -I../third_party/FP16/include -I../third_party/fmt/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qclamp.cpp.o -c ../aten/src/ATen/native/quantized/cpu/qclamp.cpp
In file included from ../aten/src/ATen/cpu/vec256/vec256.h:10:0,
                 from ../aten/src/ATen/native/cpu/Loops.h:35,
                 from ../aten/src/ATen/native/quantized/cpu/qclamp.cpp:5:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:262:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) const {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:267:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In static member function ‘static at::vec256::{anonymous}::Vec256<float> at::vec256::{anonymous}::Vec256<float>::loadu(const void*, int64_t)’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In member function ‘void at::vec256::{anonymous}::Vec256<float>::store(void*, int64_t) const’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
       vst1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
       vst1q_f32
[1957/4009] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qadd.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qadd.cpp.o
/usr/bin/c++  -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /home/jetson/.pyenv/versions/3.7.4/include/python3.7m -isystem /home/jetson/.virtualenvs/hetseq/lib/python3.7/site-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -I/usr/local/cuda/include -I../torch/csrc/api -I../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../torch/csrc -I../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../caffe2/core/nomnigraph/include -isystem include -I../third_party/FXdiv/include -I../c10/.. -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/NNPACK/include -I../third_party/FP16/include -I../third_party/fmt/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qadd.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qadd.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/quantized/cpu/qadd.cpp.o -c ../aten/src/ATen/native/quantized/cpu/qadd.cpp
In file included from ../aten/src/ATen/cpu/vec256/vec256.h:10:0,
                 from ../aten/src/ATen/native/quantized/cpu/qadd.cpp:3:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:262:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) const {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:267:3: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
   const float operator[](int idx) {
   ^~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In static member function ‘static at::vec256::{anonymous}::Vec256<float> at::vec256::{anonymous}::Vec256<float>::loadu(const void*, int64_t)’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:213:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(ptr));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: error: ‘vld1q_f32_x2’ was not declared in this scope
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:230:14: note: suggested alternative: ‘vld1q_f32’
       return vld1q_f32_x2(reinterpret_cast<const float*>(tmp_values));
              ^~~~~~~~~~~~
              vld1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h: In member function ‘void at::vec256::{anonymous}::Vec256<float>::store(void*, int64_t) const’:
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:235:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(ptr), values);
       ^~~~~~~~~~~~
       vst1q_f32
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: error: ‘vst1q_f32_x2’ was not declared in this scope
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
../aten/src/ATen/cpu/vec256/vec256_float_neon.h:242:7: note: suggested alternative: ‘vst1q_f32’
       vst1q_f32_x2(reinterpret_cast<float*>(tmp_values), values);
       ^~~~~~~~~~~~
       vst1q_f32
[1961/4009] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/ParallelCommon.cpp.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 737, in <module>
    build_deps()
  File "setup.py", line 321, in build_deps
    cmake=cmake)
  File "/home/jetson/dev/source/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/home/jetson/dev/source/pytorch/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/home/jetson/dev/source/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/home/jetson/.pyenv/versions/3.7.4/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.

I have a virtualenv with python 3.7.4 (from source via pyenv) and am trying working on getting https://github.com/yifding/hetseq tested on multiple jetsons. I followed the steps from above, shown here:

git clone http://github.com/pytorch/pytorch --recursive --branch 1.6
cd pytorch
export USE_NCCL=0
export USE_DISTRIBUTED=0
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"
export PYTORCH_BUILD_VERSION=1.6.0
export PYTORCH_BUILD_NUMBER=1
sudo apt install cmake libopenblas-dev
pip install -r requirements.txt
pip install scikit-build
pip install ninja
python setup.py bdist_wheel

Any thoughts?

I do want to note that the git clone command noted in the Build from Source instructions is not written in the correct order, it should be git clone http://github.com/pytorch/pytorch --recursive --branch <version>

I ran the build again and logged it here:

python setup.py bdist_wheel > build.log 2>&1

build.log (13.1 KB)

Hi @funkymunky, this is because the torch packages for Arm on pypi aren’t built with CUDA support, so they won’t detect your GPU.

I’m not exactly sure why this would be, but then again I hadn’t tried installing it in a virtualenv. Either way, glad that you were able to get it working.

Hi @harrison-matt, I don’t think 1.6 is a release tag for PyTorch, it appears to be a dev branch. v1.6.0 would be the release to clone (check their tags on GitHub). There are also patches to apply that are linked to under the Build from Source section of the instructions.

I haven’t tried building this version of PyTorch myself for JetPack 4.6 and with Python 3.7, so if you continue to encounter issues you may want to try a newer version of PyTorch.