Installing pytorch - /usr/local/cuda/lib64/libcudnn.so: error adding symbols: File in wrong format collect2: error: ld returned 1 exit status

Hi,

Since CUDA 9.0 is a relatively old library, would you mind upgrading your device to our latest software first?

Please note that there are some dependencies between PyTorch and CUDA-related libraries.
It’s possible that PyTorch expects newer software and leads to this error.

Thanks.

@AastaLLL I am unable to upgrade further due to hardware problems encountered. I am running a DJI Manifold 2 which contains a Nvidia jetson tx2. While it contains currently jetpack 3.3, i have tried upgrading it to jetpack 4.4

However, after i upgrade to jetpack 4.4, any system updates during that time will cause the usb hardware interfaces to stop working. Nonetheless ,if i only upgrade to jetpack 4.4, but do not perform any system updates, the usb devices will be fine. However, it also means i am not able to sudo apt-get install on a whim.

I believe the last version of PyTorch to support CUDA 9.0 was PyTorch 1.4 or PyTorch 1.5, but this may take some trial and error. It’s also unclear if that was the last version of PyTorch to officially release x86 binaries for CUDA 9.0 or if the PyTorch code itself doesn’t support building with CUDA 9.0.

It appears that you are building PyTorch with Python 2.7, but PyTorch has deprecated support for Python 2 since PyTorch 1.4. So instead please build with python3.

Can you do apt-get update and apt-get install without doing an apt-get upgrade?

@dusty_nv I am running a virtual environment with python3.8. So when i call python, it’s actually runnin python3.8.

Sorry just to clarify, it means that after i upgrade to jetpack 4.4, i can still run those commands. Yes i can run apt-get update and apt-get install, and even apt-get upgrade. But the issue is, some of the packages installed, downloaded and upgraded via these commands may cause my usb interfaces to stop working.

OK gotcha - here is a topic about building PyTorch for Python 3.8 that may be helpful:

If it’s the apt-get upgrade command that causes the USB interfaces on your DJI Manifold to stop working, you may want to refrain from performing the package upgrades or determine which package(s) to mark as hold so they aren’t upgraded.

@dusty_nv would you happen to know of a list of apt-get packages that may interact with the USB interfaces for a TX2 ?

Unfortunately I don’t as I haven’t encountered that previously - you may want to file a different topic about this separate issue, or try contacting the drone manufacturer if it’s specific to their device.

@dusty_nv I managed to upgrade my device to Jetpack 4.4, which contains CUDA 10.2. I know you maintain a page PyTorch for Jetson full of the pytorch installers. However i notice that they were for python3.6
Do you have a list for python3.8 ?

Hi @pylonicGateway, I personally only build the PyTorch wheels for Python 3.6 because that is the default version of Python that comes with the version of Ubuntu currently in JetPack and it’s a lot for me to support different versions of Python. If you search that PyTorch post though, I believe other users have posted their wheels for Python 3.7 or 3.8.

@dusty_nv ok i think i have found something relevant at Install PyTorch with Python 3.8 on Jetpack 4.4.1 - #2 by dusty_nv. I will try this out.

I had encountered this error while trying to install…

/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/c10/cuda/CUDAMathCompat.h: In static member function ‘static scalar_t at::native::copysign_kernel_cuda(at::TensorIterator&)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t)>::_FUN(scalar_t, scalar_t)’:
/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/c10/cuda/CUDAMathCompat.h:46:24: internal compiler error: Segmentation fault
   return ::copysignf(x, y);
                        ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
CMake Error at torch_cuda_generated_CopysignKernel.cu.o.Release.cmake:281 (message):
  Error generating file
  /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_CopysignKernel.cu.o


caffe2/CMakeFiles/torch_cuda.dir/build.make:89336: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_CopysignKernel.cu.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_CopysignKernel.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/usr/include/c++/7/cmath: In static member function ‘static scalar_t at::native::div_floor_kernel_cuda(at::TensorIterator&)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t)>::_FUN(scalar_t, scalar_t)’:
/usr/include/c++/7/cmath:1302:38: internal compiler error: Segmentation fault
   { return __builtin_copysignf(__x, __y); }
                                      ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
CMake Error at torch_cuda_generated_BinaryMulDivKernel.cu.o.Release.cmake:281 (message):
  Error generating file
  /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_BinaryMulDivKernel.cu.o


caffe2/CMakeFiles/torch_cuda.dir/build.make:79700: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_BinaryMulDivKernel.cu.o] Error 1
CMakeFiles/Makefile2:8965: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/torch_cuda.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "setup.py", line 818, in <module>
    build_deps()
  File "setup.py", line 315, in build_deps
    build_caffe2(version=version,
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/tools/build_pytorch_libs.py", line 58, in build_caffe2
    cmake.build(my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch1.8/tools/setup_helpers/cmake.py", line 140, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 2.

Is there some gcc version problem that caused it ?

Hi @pylonicGateway, which version of PyTorch are you building? It looks to be this error:

https://github.com/pytorch/pytorch/pull/51834

Note that a workaround is included for this in my patch set for PyTorch 1.8:
https://gist.github.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1#file-pytorch-1-8-jetpack-4-4-1-patch

@dusty_nv hey im not sure if i am using your pytorch patch right…
Sorry I have never used a patch file before

So i copied your patch file for “pytorch1.8 jeptack4.4.1” out into a gedit file, then put the gedit file into the pytorch folder.
I got this output

patch < torch18.patch 
can't find file to patch at input line 5
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/aten/src/ATen/cuda/CUDAContext.cpp b/aten/src/ATen/cuda/CUDAContext.cpp
|index 1751128f1a..03e74f5ac2 100644
|--- a/aten/src/ATen/cuda/CUDAContext.cpp
|+++ b/aten/src/ATen/cuda/CUDAContext.cpp
--------------------------
File to patch: 

Am i doing this right ?

I created that patch using git diff, so I think git apply would be the way to apply it - however this would require you to have cloned the same branch of PyTorch that I did. For sanity normally I just apply these patches by hand (by going into the individual files and copy/pasting the changes)

@dusty_nv hi I have tried patching by hand and this is what i got

-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.22.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.4.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /usr/local/cuda-10.2
--   CMAKE_INSTALL_PREFIX  : /usr/local
-- 
--   TORCH_VERSION         : 1.8.0
--   CAFFE2_VERSION        : 1.8.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Protobuf compiler   : 
--     Protobuf includes   : 
--     Protobuf libraries  : 
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : ON
--     Python version      : 3.8
--     Python executable   : /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/bin/python
--     Pythonlibs version  : 3.8.0
--     Python library      : /usr/lib/python3.8
--     Python includes     : /usr/include/python3.8
--     Python site-packages: lib/python3.8/site-packages
--   BUILD_SHARED_LIBS     : OFF
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : ON
--   BUILD_TEST            : OFF
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : open
--   USE_LAPACK            : 1
--     LAPACK              : open
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : OFF
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 10.2
--     cuDNN version       : 8.0.0
--     CUDA root directory : /usr/local/cuda-10.2
--     CUDA library        : /usr/local/cuda-10.2/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda-10.2/lib64/libcudart.so
--     cublas library      : /usr/lib/aarch64-linux-gnu/libcublas.so
--     cufft library       : /usr/local/cuda-10.2/lib64/libcufft.so
--     curand library      : /usr/local/cuda-10.2/lib64/libcurand.so
--     cuDNN library       : /usr/lib/aarch64-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda-10.2/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda-10.2/include
--     NVCC executable     : /usr/local/cuda-10.2/bin/nvcc
--     NVCC flags          : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : OFF
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   USE_DEPLOY           : OFF
--   Public Dependencies  : Threads::Threads
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
-- Configuring done
-- Generating done
-- Build files have been written to: /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch/build
[0/1] Re-running CMake...
-- std::exception_ptr is supported.
-- Turning off deprecation warning due to glog.
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- 
-- 3.11.4.0
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:152 (message):
  MKL could not be found.  Defaulting to Eigen
Call Stack (most recent call first):
  CMakeLists.txt:564 (include)


CMake Warning at cmake/Dependencies.cmake:175 (message):
  Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:564 (include)


-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf_lp64 - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_gnu_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_gf - mkl_intel_thread - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_sequential - mkl_core - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_sequential - mkl_core - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - gomp - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - iomp5 - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread - m - dl]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread - m - dl]
--   Library mkl_intel: not found
-- Checking for [mkl_gf_lp64 - mkl_core - pthread - m - dl]
--   Library mkl_gf_lp64: not found
-- Checking for [mkl_gf - mkl_core - pthread - m - dl]
--   Library mkl_gf: not found
-- Checking for [mkl - guide - pthread - m]
--   Library mkl: not found
-- MKL library not found
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Found OpenBLAS libraries: /usr/lib/aarch64-linux-gnu/libopenblas.so
-- Found OpenBLAS include: /usr/include/aarch64-linux-gnu
-- Found a library with BLAS API (open). Full path: (/usr/lib/aarch64-linux-gnu/libopenblas.so)
-- Brace yourself, we are building NNPACK
-- NNPACK backend is neon
CMake Warning at cmake/Dependencies.cmake:732 (message):
  Turning USE_FAKELOWP off as it depends on USE_FBGEMM.
Call Stack (most recent call first):
  CMakeLists.txt:564 (include)


-- Found Numa  (include: /usr/include, library: /usr/lib/aarch64-linux-gnu/libnuma.so)
-- Using third party subdirectory Eigen.
-- Setting Python to /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/bin/python
-- Setting Python's include dir to /usr/include/python3.8 from distutils.sysconfig
-- Setting Python's library to /usr/lib/python3.8
-- NumPy ver. 1.22.2 found (include: /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/lib/python3.8/site-packages/numpy/core/include)
-- Using third_party/pybind11.
-- pybind11 include dirs: /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch/cmake/../third_party/pybind11/include
-- MPI support found
-- MPI compile flags: -pthread
-- MPI include path: /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/aarch64-linux-gnu/openmpi/include
-- MPI LINK flags path: -pthread
-- MPI libraries: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
CMake Warning at cmake/Dependencies.cmake:1038 (message):
  OpenMPI found, but it is not built with CUDA support.
Call Stack (most recent call first):
  CMakeLists.txt:564 (include)


-- Adding OpenMP CXX_FLAGS: -fopenmp
-- No OpenMP library needs to be linked against
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda-10.2/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-10.2
-- Caffe2: Header version is: 10.2
-- Found cuDNN: v8.0.0  (include: /usr/include, library: /usr/lib/aarch64-linux-gnu/libcudnn.so)
-- /usr/local/cuda-10.2/lib64/libnvrtc.so shorthash is c13e41e1
CMake Warning at cmake/public/utils.cmake:365 (message):
  In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST
  to cmake instead of implicitly setting it as an env variable.  This will
  become a FATAL_ERROR in future version of pytorch.
Call Stack (most recent call first):
  cmake/public/cuda.cmake:483 (torch_cuda_get_nvcc_gencode_flag)
  cmake/Dependencies.cmake:1148 (include)
  CMakeLists.txt:564 (include)


-- Added CUDA NVCC flags for: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
CMake Warning at cmake/External/nccl.cmake:66 (message):
  Objcopy version is too old to support NCCL library slimming
Call Stack (most recent call first):
  cmake/Dependencies.cmake:1271 (include)
  CMakeLists.txt:564 (include)


-- Could NOT find CUB (missing: CUB_INCLUDE_DIR) 
-- Gloo build as STATIC library
-- MPI include path: /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include/usr/lib/aarch64-linux-gnu/openmpi/include
-- MPI libraries: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so
-- Found CUDA: /usr/local/cuda-10.2 (found suitable version "10.2", minimum required is "7.0") 
-- CUDA detected: 10.2
CMake Warning (dev) at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
  The package name passed to `find_package_handle_standard_args` (NCCL) does
  not match the name of the calling package (nccl).  This can lead to
  problems in calling code that expects `find_package` result variables
  (e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
  third_party/gloo/cmake/Modules/Findnccl.cmake:45 (find_package_handle_standard_args)
  third_party/gloo/cmake/Dependencies.cmake:128 (find_package)
  third_party/gloo/CMakeLists.txt:102 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Could NOT find NCCL (missing: NCCL_INCLUDE_DIR NCCL_LIBRARY) 
CMake Warning at third_party/gloo/cmake/Dependencies.cmake:133 (message):
  Not compiling with NCCL support.  Suppress this warning with
  -DUSE_NCCL=OFF.
Call Stack (most recent call first):
  third_party/gloo/CMakeLists.txt:102 (include)

The issue is that these compiler information seems to be on a loop. No installer progress is being made. So there seems to be a ninfinite loop

Hi @pylonicGateway, these all seem to be different messages - at a glance, I don’t see them repeating. What happens if you just let it run?

@dusty_nv ah I only copied over the part that was being looped, not the whole terminal output.

I left it to run for about 30mins and it was going through the same things. Until it hit this portion.

-- 
-- ******** Summary ********
-- General:
--   CMake version         : 3.22.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 7.4.0
--   CXX flags             :  -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /usr/local/cuda-10.2
--   CMAKE_INSTALL_PREFIX  : /usr/local
-- 
--   TORCH_VERSION         : 1.8.0
--   CAFFE2_VERSION        : 1.8.0
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : OFF
--   BUILD_CUSTOM_PROTOBUF : ON
--     Protobuf compiler   : 
--     Protobuf includes   : 
--     Protobuf libraries  : 
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : ON
--     Python version      : 3.8
--     Python executable   : /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/bin/python
--     Pythonlibs version  : 3.8.0
--     Python library      : /usr/lib/python3.8
--     Python includes     : /usr/include/python3.8
--     Python site-packages: lib/python3.8/site-packages
--   BUILD_SHARED_LIBS     : OFF
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : ON
--   BUILD_TEST            : OFF
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : open
--   USE_LAPACK            : 1
--     LAPACK              : open
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : ON
--     Split CUDA          : OFF
--     CUDA static link    : OFF
--     USE_CUDNN           : ON
--     CUDA version        : 10.2
--     cuDNN version       : 8.0.0
--     CUDA root directory : /usr/local/cuda-10.2
--     CUDA library        : /usr/local/cuda-10.2/lib64/stubs/libcuda.so
--     cudart library      : /usr/local/cuda-10.2/lib64/libcudart.so
--     cublas library      : /usr/lib/aarch64-linux-gnu/libcublas.so
--     cufft library       : /usr/local/cuda-10.2/lib64/libcufft.so
--     curand library      : /usr/local/cuda-10.2/lib64/libcurand.so
--     cuDNN library       : /usr/lib/aarch64-linux-gnu/libcudnn.so
--     nvrtc               : /usr/local/cuda-10.2/lib64/libnvrtc.so
--     CUDA include path   : /usr/local/cuda-10.2/include
--     NVCC executable     : /usr/local/cuda-10.2/bin/nvcc
--     NVCC flags          : -Xfatbin;-compress-all;-DONNX_NAMESPACE=onnx_torch;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72;-Xcudafe;--diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl;-std=c++14;-Xcompiler;-fPIC;--expt-relaxed-constexpr;--expt-extended-lambda;-Wno-deprecated-gpu-targets;--expt-extended-lambda;-Xcompiler;-fPIC;-DCUDA_HAS_FP16=1;-D__CUDA_NO_HALF_OPERATORS__;-D__CUDA_NO_HALF_CONVERSIONS__;-D__CUDA_NO_BFLOAT16_CONVERSIONS__;-D__CUDA_NO_HALF2_OPERATORS__
--     CUDA host compiler  : /usr/bin/cc
--     NVCC --device-c     : OFF
--     USE_TENSORRT        : OFF
--   USE_ROCM              : OFF
--   USE_EIGEN_FOR_BLAS    : ON
--   USE_FBGEMM            : OFF
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : OFF
--   USE_FFMPEG            : OFF
--   USE_GFLAGS            : OFF
--   USE_GLOG              : OFF
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : OFF
--   USE_MKLDNN            : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : OFF
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : OFF
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI             : ON
--     USE_GLOO            : ON
--     USE_TENSORPIPE      : ON
--   USE_DEPLOY           : OFF
--   Public Dependencies  : Threads::Threads
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;/usr/lib/aarch64-linux-gnu/libnuma.so;fp16;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so;/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;gcc_s;gcc;dl
-- Configuring done
-- Generating done
-- Build files have been written to: /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch/build
ninja: error: manifest 'build.ninja' still dirty after 100 tries

Traceback (most recent call last):
  File "setup.py", line 818, in <module>
    build_deps()
  File "setup.py", line 315, in build_deps
    build_caffe2(version=version,
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/build_pytorch_libs.py", line 58, in build_caffe2
    cmake.build(my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/setup_helpers/cmake.py", line 140, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.

seems like ninja is giving me some problems. However, if it perform pip uninstall ninja, then run the setup.py for pytorch, i get this

Building wheel torch-1.8.0
-- Building version 1.8.0
cmake --build . --target install --config Release -- -j 6
CMake Error: The current CMakeCache.txt directory /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/build/CMakeCache.txt is different than the directory /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorch/build where CMakeCache.txt was created. This may result in binaries being created in the wrong place. If you are not sure, reedit the CMakeCache.txt
No such file or directory
CMake Error: Generator: execution of make failed. Make command was: /media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/bin/ninja -j 6 install && 
Traceback (most recent call last):
  File "setup.py", line 818, in <module>
    build_deps()
  File "setup.py", line 315, in build_deps
    build_caffe2(version=version,
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/build_pytorch_libs.py", line 58, in build_caffe2
    cmake.build(my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/media/8eb89120-ba8b-4110-96c7-3179158d68ee/pyenvs/newpy38/pytorchOLD/tools/setup_helpers/cmake.py", line 140, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.

Which seems largely similar

Unfortunately I’m not sure how to fix the issue, or if it’s specific to Jetson/ARM or your configuration at all. You could try to build an unpatched/vanilla clone of PyTorch repo and see if that’s related. You may also want to try posting to the PyTorch forum about it to see if they may know.

@dusty_nv would you know if the version of cmake matters ? I had also upgraded to cmake 3.22. Previously when using an older version of cmake, there was no infinite loop.

Hmm I’m not sure myself, as I build PyTorch with the default version of CMake in the Ubuntu 18.04 repo (which is CMake 3.10.2). That’s a good observation though - and since the infinite loop does seem to be occurring during the CMake config stage, you may want to try downgrading it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.