[REQUEST] build script for pytorch or up to date pytorh binary release supporting jetson boards running L4T35.6(ubuntu20.04)

daniel2008_12 · December 17, 2024, 1:07am

Since the latest of pytorch version is 2.5, maybe > 2.5 when you reading this.

The Jetpack 5.1.3/5.1.4 based on ubntu 20.04 only supports torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl.

Can NVIDIA support latest build or share build script for us to compile the latest up to date pytorch version?

AastaLLL · December 17, 2024, 6:58am

Hi,

The latest PyTorch is built with the latest JetPack which is 6.1 currently.
For building command, you can find it in the link below:

Thanks

daniel2008_12 · December 17, 2024, 7:10am

@AastaLLL I need latest build for Jetpack 5.1.4 (L4T35.6) , not Jetpack 6.x. Thanks.

Do you have PyTorch v2.3.0 for Jetpack 5.1.4 (L4T35.6) ?

EDIT: Currently, we have 2.1.0a0+41361538.nv23.06 for Jetpack 5.1.4 (L4T35.6), which is based on ubuntu20.04. I don’t think PyTorch v2.3.0 for Jetpack 6.0 (L4T 36.2), which is based on ubutnu22.04 can be installed properly on my current env.

Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Orin Nano Developer Kit - Jetpack 5.1.4 [L4T 35.6.0]
NV Power Mode[0]: 15W
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3767-0005
 - Module: NVIDIA Jetson Orin Nano (Developer kit)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.216-tegra
jtop:
 - Version: 4.2.12
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 8.5.2.2
 - VPI: 2.4.8
 - OpenCV: 4.9.0 - with CUDA: YES
DeepStream C/C++ SDK version: 6.3

Python Environment:
Python 3.8.10
    GStreamer:                   YES (1.16.3)
  NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS FAST_MATH)
        OpenCV version: 4.9.0  CUDA True
          YOLO version: 8.3.33
         Torch version: 2.1.0a0+41361538.nv23.06
   Torchvision version: 0.16.1+fdea156
DeepStream SDK version: 1.1.8

EDIT2: And I haven’t see any EOL announcement of jetpack 5 yet, so I made two choices above for the long term support of Jetpack 5.x series.

EDIT3: @AastaLLL I didn’t find building command in the link you provided above. I only find wheel installation commands. What I mean is build from code, If NVIDIA has it’s private code then it will provide libs which would link to the opensource.

AastaLLL · December 18, 2024, 7:47am

Hi,

Please check Instructions → Build from Source.
Thanks.

daniel2008_12 · December 18, 2024, 9:03am

@AastaLLL Oh sorry, I didn’t notice this. That’s Great!

Thank you very much!

PS: We have met some version compatibility issue here: Yolov8s no bounding box on default settings · Issue #597 · marcoslucianops/DeepStream-Yolo · GitHub . So it’s good to have a backup plan.

Continuing the discussion from PyTorch for Jetson:

PyTorch for Jetson

Build from Source

Below are the steps used to build the PyTorch wheels. These were compiled in a couple of hours on a Xavier for Nano, TX2, and Xavier.

Note that if you are trying to build on Nano, you will need to mount a swap file.

Max Performance
$ sudo nvpmodel -m 0     # on Xavier NX, use -m 2  instead (15W 6-core mode)
$ sudo jetson_clocks
Download PyTorch sources
$ git clone --recursive --branch <version> http://github.com/pytorch/pytorch
$ cd pytorch
Apply Patch
Select the patch to apply from below based on the version of JetPack you’re building on. The patches avoid the “too many CUDA resources requested for launch” error (PyTorch issue #8103, in addition to some version-specific bug fixes.

PyTorch 1.11 - pytorch-1-11-jetpack-5-0.patch

PyTorch 1.10 - pytorch-1.10-jetpack-4.5.1.patch

PyTorch 1.9 - pytorch-1.9-jetpack-4.5.1.patch

PyTorch 1.8 - pytorch-1.8-jetpack-4.4.1.patch

PyTorch 1.7 - pytorch-1-7-jetpack-4.4.1.patch

PyTorch 1.6 - pytorch-1.6-jetpack-4.4.patch

PyTorch 1.5 - pytorch-1.5-jetpack-4.4.patch

PyTorch 1.4 - pytorch-1.4-jetpack-4.4.patch

PyTorch 1.3 - pytorch-1.3-jetpack-4.2.patch

If you are applying one of the above patches to a different version of PyTorch, the file line locations may have changed, so it is recommended to apply these changes by hand.

Set Build Options
$ export USE_NCCL=0
$ export USE_DISTRIBUTED=0                    # skip setting this if you want to enable OpenMPI backend
$ export USE_QNNPACK=0
$ export USE_PYTORCH_QNNPACK=0
$ export TORCH_CUDA_ARCH_LIST="5.3;6.2;7.2"   # or "7.2;8.7" for JetPack 5 wheels for Xavier/Orin
$ export PYTORCH_BUILD_VERSION=<version>  # without the leading 'v', e.g. 1.3.0 for PyTorch v1.3.0
$ export PYTORCH_BUILD_NUMBER=1
(remember to re-export these environment variables if you change terminal)

Build wheel for Python 2.7 (to pytorch/dist)
$ sudo apt-get install python-pip cmake libopenblas-dev libopenmpi-dev 
$ pip install -U pip

$ sudo pip install -U setuptools
$ sudo pip install -r requirements.txt

$ pip install scikit-build --user
$ pip install ninja --user

$ python setup.py bdist_wheel
Build wheel for Python 3.6 (to pytorch/dist)
$ sudo apt-get install python3-pip cmake libopenblas-dev libopenmpi-dev 

$ pip3 install -r requirements.txt
$ pip3 install scikit-build
$ pip3 install ninja

$ python3 setup.py bdist_wheel
Note on Upgrading pip

If you get this error from pip/pip3 after upgrading pip with “pip install -U pip”:
pip
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError
: cannot import name 'main'
You can either downgrade pip to it’s original version:
# Python 2.7
$ sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall

# Python 3.6
$ sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall
-or- you can patch /usr/bin/pip (or /usr/bin/pip3)
diff --git a/pip b/pip
index 56bbb2b..62f26b9 100755
--- a/pip
+++ b/pip
@@ -6,6 +6,6 @@ import sys
 # Run the main entry point, similarly to how setuptools does it, but because
 # we didn't install the actual entry point from setup.py, don't use the
 # pkg_resources API.
-from pip import main
+from pip import __main__
 if __name__ == '__main__':
-    sys.exit(main())
+    sys.exit(__main__._main())

daniel2008_12 · December 19, 2024, 2:33am

@AastaLLL

I checkout latest 2.5.1 for my jetpack 5.1.4, and follow the step below, does my TORCH_CUDA_ARCH_LIST set wrong, or something not properly configured causing config error?

git clone --recursive --branch v2.5.1 git@github.com:pytorch/pytorch.git
export USE_NCCL=0
export USE_DISTRIBUTED=0
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="7.2;8.7"
export PYTORCH_BUILD_VERSION=2.5.1
export PYTORCH_BUILD_NUMBER=1
cd pytorch/

sudo apt-get install python3-pip cmake libopenblas-dev libopenmpi-dev 

pip3 install -r requirements.txt
pip3 install scikit-build
pip3 install ninja


daniel@daniel-nvidia:~/Work/pytorch$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-------------------------------------------------------------------------------------------------
|                                                                                               |
|            WARNING: we strongly recommend enabling linker script optimization for ARM + CUDA. |
|            To do so please export USE_PRIORITIZED_TEXT_FOR_LD=1                               |
|                                                                                               |
-------------------------------------------------------------------------------------------------
-- Building version 2.5.1
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/daniel/Work/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3.8/site-packages -DPython_EXECUTABLE=/usr/bin/python3 -DTORCH_BUILD_VERSION=2.5.1 -DTORCH_CUDA_ARCH_LIST=7.2;8.7 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/daniel/Work/pytorch
-- The CXX compiler identification is GNU 9.4.0
-- The C compiler identification is GNU 9.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- /usr/bin/c++ /home/daniel/Work/pytorch/torch/abi-check.cpp -o /home/daniel/Work/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Not forcing any particular BLAS to be found
-- Performing Test C_HAS_AVX_1
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX_2
-- Performing Test C_HAS_AVX_2 - Failed
-- Performing Test C_HAS_AVX_3
-- Performing Test C_HAS_AVX_3 - Failed
-- Performing Test C_HAS_AVX2_1
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX2_2
-- Performing Test C_HAS_AVX2_2 - Failed
-- Performing Test C_HAS_AVX2_3
-- Performing Test C_HAS_AVX2_3 - Failed
-- Performing Test C_HAS_AVX512_1
-- Performing Test C_HAS_AVX512_1 - Failed
-- Performing Test C_HAS_AVX512_2
-- Performing Test C_HAS_AVX512_2 - Failed
-- Performing Test C_HAS_AVX512_3
-- Performing Test C_HAS_AVX512_3 - Failed
-- Performing Test CXX_HAS_AVX_1
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX_2
-- Performing Test CXX_HAS_AVX_2 - Failed
-- Performing Test CXX_HAS_AVX_3
-- Performing Test CXX_HAS_AVX_3 - Failed
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Failed
-- Performing Test CXX_HAS_AVX2_3
-- Performing Test CXX_HAS_AVX2_3 - Failed
-- Performing Test CXX_HAS_AVX512_1
-- Performing Test CXX_HAS_AVX512_1 - Failed
-- Performing Test CXX_HAS_AVX512_2
-- Performing Test CXX_HAS_AVX512_2 - Failed
-- Performing Test CXX_HAS_AVX512_3
-- Performing Test CXX_HAS_AVX512_3 - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Found CUDA: /usr/local/cuda (found version "11.4")
-- The CUDA compiler identification is unknown
CMake Error at /usr/local/share/cmake-3.31/Modules/CMakeDetermineCUDACompiler.cmake:266 (message):
  Failed to detect a default CUDA architecture.



  Compiler output:

Call Stack (most recent call first):
  cmake/public/cuda.cmake:47 (enable_language)
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:863 (include)


-- Configuring incomplete, errors occurred!

EDIT: export USE_PRIORITIZED_TEXT_FOR_LD=1

daniel@daniel-nvidia:~/Work/pytorch$ export USE_PRIORITIZED_TEXT_FOR_LD=1
daniel@daniel-nvidia:~/Work/pytorch$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-- Building version 2.5.1
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/daniel/Work/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3.8/site-packages -DPython_EXECUTABLE=/usr/bin/python3 -DTORCH_BUILD_VERSION=2.5.1 -DTORCH_CUDA_ARCH_LIST=7.2;8.7 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PRIORITIZED_TEXT_FOR_LD=1 -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/daniel/Work/pytorch
-- /usr/bin/c++ /home/daniel/Work/pytorch/torch/abi-check.cpp -o /home/daniel/Work/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
CMake Error at /usr/local/share/cmake-3.31/Modules/Internal/CMakeCUDAArchitecturesValidate.cmake:7 (message):
  CMAKE_CUDA_ARCHITECTURES must be non-empty if set.
Call Stack (most recent call first):
  /usr/local/share/cmake-3.31/Modules/CMakeDetermineCUDACompiler.cmake:112 (cmake_cuda_architectures_validate)
  cmake/public/cuda.cmake:47 (enable_language)
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:863 (include)


-- Configuring incomplete, errors occurred!

EDIT2: It seems something wrong with cmake. Any idea or tips for fix this?

AastaLLL · December 19, 2024, 5:47am

Hi,

GPU architecture for the Orin series is 8.7.
Using either export TORCH_CUDA_ARCH_LIST="7.2;8.7" or export TORCH_CUDA_ARCH_LIST="8.7" are good.

Based on this error: The CUDA compiler identification is unknown
Could you verify if the nvcc is added to the environment variable?

$ nvcc --version

If the command returns not found error, please run the below to export CUDA to the environment variables.

$ export PATH=/usr/local/cuda-12.2/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH

Once you can get the nvcc info successfully, please try the PyTorch building commands again.
Thanks.

daniel2008_12 · December 19, 2024, 7:00am

It seems jetson orin runs out of memory???

EDIT: I run python3 setup.py bdist_wheel multiple times, all end up here:

$ export PATH=/usr/local/cuda/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-- Building version 2.5.1
cmake --build . --target install --config Release
[1/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/daniel/Work/pytorch/build/aten/src -I/home/daniel/Work/pytorch/aten/src -I/home/daniel/Work/pytorch/build -I/home/daniel/Work/pytorch -I/home/daniel/Work/pytorch/cmake/../third_party/benchmark/include -I/home/daniel/Work/pytorch/third_party/onnx -I/home/daniel/Work/pytorch/build/third_party/onnx -I/home/daniel/Work/pytorch/nlohmann -I/home/daniel/Work/pytorch/torch/csrc/api -I/home/daniel/Work/pytorch/torch/csrc/api/include -I/home/daniel/Work/pytorch/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src -I/home/daniel/Work/pytorch/build/caffe2/../aten/src -I/home/daniel/Work/pytorch/torch/csrc -I/home/daniel/Work/pytorch/third_party/miniz-2.1.0 -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/include -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/src -I/home/daniel/Work/pytorch/third_party/cpp-httplib -I/home/daniel/Work/pytorch/aten/src/ATen/.. -I/home/daniel/Work/pytorch/third_party/FXdiv/include -I/home/daniel/Work/pytorch/c10/.. -I/home/daniel/Work/pytorch/third_party/pthreadpool/include -I/home/daniel/Work/pytorch/third_party/cpuinfo/include -I/home/daniel/Work/pytorch/third_party/NNPACK/include -I/home/daniel/Work/pytorch/third_party/FP16/include -I/home/daniel/Work/pytorch/third_party/fmt/include -I/home/daniel/Work/pytorch/third_party/flatbuffers/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/daniel/Work/pytorch/third_party/protobuf/src -isystem /home/daniel/Work/pytorch/third_party/XNNPACK/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/daniel/Work/pytorch/INTERFACE -isystem /home/daniel/Work/pytorch/third_party/nlohmann/include -isystem /home/daniel/Work/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o -c /home/daniel/Work/pytorch/build/aten/src/ATen/RegisterCPU.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[2/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/daniel/Work/pytorch/build/aten/src -I/home/daniel/Work/pytorch/aten/src -I/home/daniel/Work/pytorch/build -I/home/daniel/Work/pytorch -I/home/daniel/Work/pytorch/cmake/../third_party/benchmark/include -I/home/daniel/Work/pytorch/third_party/onnx -I/home/daniel/Work/pytorch/build/third_party/onnx -I/home/daniel/Work/pytorch/nlohmann -I/home/daniel/Work/pytorch/torch/csrc/api -I/home/daniel/Work/pytorch/torch/csrc/api/include -I/home/daniel/Work/pytorch/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src -I/home/daniel/Work/pytorch/build/caffe2/../aten/src -I/home/daniel/Work/pytorch/torch/csrc -I/home/daniel/Work/pytorch/third_party/miniz-2.1.0 -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/include -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/src -I/home/daniel/Work/pytorch/third_party/cpp-httplib -I/home/daniel/Work/pytorch/aten/src/ATen/.. -I/home/daniel/Work/pytorch/third_party/FXdiv/include -I/home/daniel/Work/pytorch/c10/.. -I/home/daniel/Work/pytorch/third_party/pthreadpool/include -I/home/daniel/Work/pytorch/third_party/cpuinfo/include -I/home/daniel/Work/pytorch/third_party/NNPACK/include -I/home/daniel/Work/pytorch/third_party/FP16/include -I/home/daniel/Work/pytorch/third_party/fmt/include -I/home/daniel/Work/pytorch/third_party/flatbuffers/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/daniel/Work/pytorch/third_party/protobuf/src -isystem /home/daniel/Work/pytorch/third_party/XNNPACK/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/daniel/Work/pytorch/INTERFACE -isystem /home/daniel/Work/pytorch/third_party/nlohmann/include -isystem /home/daniel/Work/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o -c /home/daniel/Work/pytorch/build/aten/src/ATen/Operators_2.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[8/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_1.cpp.o
ninja: build stopped: subcommand failed.

EDIT2: /usr/local/cuda is a softlink.

daniel@daniel-nvidia:~/Work/pytorch$ ls -l /usr/local/cuda
cuda/      cuda-11/   cuda-11.4/
daniel@daniel-nvidia:~/Work/pytorch$ ls -l /usr/local/cuda*
lrwxrwxrwx  1 root root   22 11月  7 09:03 /usr/local/cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 11月  7 09:03 /usr/local/cuda-11 -> /etc/alternatives/cuda-11

/usr/local/cuda-11.4:
total 112
drwxr-xr-x  3 root root  4096 11月  7 09:00 bin
drwxr-xr-x  4 root root  4096 11月  7 09:00 compute-sanitizer
-rw-r--r--  1 root root   160 9月  14  2022 DOCS
-rw-r--r--  1 root root 61727 9月  14  2022 EULA.txt
drwxr-xr-x  4 root root  4096 11月  7 09:00 extras
lrwxrwxrwx  1 root root    29 9月  19  2022 include -> targets/aarch64-linux/include
lrwxrwxrwx  1 root root    25 9月  14  2022 lib64 -> targets/aarch64-linux/lib
drwxr-xr-x  3 root root  4096 11月  7 09:00 nvml
drwxr-xr-x  7 root root  4096 11月  7 08:59 nvvm
-rw-r--r--  1 root root   524 9月  14  2022 README
drwxr-xr-x 11 root root  4096 11月  7 09:00 samples
drwxr-xr-x  3 root root  4096 11月  7 09:00 share
drwxr-xr-x  3 root root  4096 11月  7 08:58 targets
drwxr-xr-x  2 root root  4096 11月  7 09:00 tools
-rw-r--r--  1 root root  2127 10月 25  2022 version.json

EDIT3: $ nvcc --version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

daniel2008_12 · December 19, 2024, 8:51am

With countless(20 more???) build, end up below error, so what version is suggested for Jetpack 5.1.4?

daniel@daniel-nvidia:~/Work/pytorch$ git log -n 1
commit a8d6afb511a69687bbb2b7e88a3cf67917e1697e (HEAD, tag: v2.5.1-rc1, tag: v2.5.1, origin/release/2.5)
Author: pytorchbot <soumith+bot@pytorch.org>
Date:   Tue Oct 22 18:14:52 2024 -0700

    Disabling amp context when invoking compiler (#138659)

    Disabling amp context when invoking compiler (#138624)

    Fix for https://github.com/pytorch/pytorch/issues/133974

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/138624
    Approved by: https://github.com/bdhirsh, https://github.com/drisspg

    (cherry picked from commit 5942b2985000e0c69ec955b6c88dee8b5d7e67fd)

    Co-authored-by: eellison <elias.ellison@gmail.com>

Failed log attached here:
log.txt (126.5 KB)

EDIT: Here is a link in Pytorch issue ticket (Hope to get more help on this): pytorch v2.5.1 build for nvidia jetson orin 8GB · Issue #143624 · pytorch/pytorch · GitHub

AastaLLL · December 23, 2024, 7:15am

Hi,

Based on their document, it looks like you need Python >= 3.9 and CUDA >=11.8 to build PyTorch 2.5.x.
Maybe PyTorch team can share more info about their dependences.

Thanks.

daniel2008_12 · December 23, 2024, 11:55am

@AastaLLL Thanks again.

I will need NVIDIA’s expertise to help me check which version is OK on my current (latest ubuntu20.4) Jetpack 5.1.4 system with CUDA: 11.4.315 cuDNN: 8.6.0.166 Python 3.8.10 ?

The above issue is caused by python3.8, if I understand correctly??? But none of the suggested version matchs jetpack 5.1.4.

Now, what would be suitable for me to compile? how about 2.4??? will cuda or cudnn fail?

github.com

pytorch/pytorch/blob/7314cf44ae719dfbc9159496030ce84d152686e4/RELEASE.md?plain=1#L51-L61


      
          | PyTorch version | Python | C++ | Stable CUDA | Experimental CUDA | Stable ROCm |
          | --- | --- | --- | --- | --- | --- |
          | 2.6 | >=3.9, <=3.13, (3.13t experimental) | C++17 | CUDA 11.8, CUDA 12.4 (CUDNN 9.1.0.70), CUDA 12.6 (CUDNN 9.5.1.17) | None | ROCm 6.2.4 |
          | 2.5 | >=3.9, <=3.12, (3.13 experimental) | C++17 | CUDA 11.8, CUDA 12.1, CUDA 12.4, CUDNN 9.1.0.70  | None | ROCm 6.2 |
          | 2.4 | >=3.8, <=3.12 | C++17 | CUDA 11.8, CUDA 12.1, CUDNN 9.1.0.70  | CUDA 12.4, CUDNN 9.1.0.70 | ROCm 6.1 |
          | 2.3 | >=3.8, <=3.11, (3.12 experimental) | C++17 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 | ROCm 6.0 |
          | 2.2 | >=3.8, <=3.11, (3.12 experimental) | C++17 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 | ROCm 5.7 |
          | 2.1 | >=3.8, <=3.11 | C++17 | CUDA 11.8, CUDNN 8.7.0.84 | CUDA 12.1, CUDNN 8.9.2.26 | ROCm 5.6 |
          | 2.0 | >=3.8, <=3.11 | C++14 | CUDA 11.7, CUDNN 8.5.0.96 | CUDA 11.8, CUDNN 8.7.0.84 | ROCm 5.4 |
          | 1.13 | >=3.7, <=3.10 | C++14 | CUDA 11.6, CUDNN 8.3.2.44 | CUDA 11.7, CUDNN 8.5.0.96 | ROCm 5.2 |
          | 1.12 | >=3.7, <=3.10 | C++14 | CUDA 11.3, CUDNN 8.3.2.44 | CUDA 11.6, CUDNN 8.3.2.44 | ROCm 5.0 |

daniel2008_12 · December 25, 2024, 1:37am

v2.4.1 still failed - pytorch v2.4.1 build for nvidia jetson orin nano 8GB #143816

$ git log -n 1
commit ee1b6804381c57161c477caa380a840a84167676 (HEAD, tag: v2.4.1, origin/release/2.4)
Author: pytorchbot <soumith+bot@pytorch.org>
Date:   Wed Aug 28 17:25:42 2024 -0700

    [Doc] Fix rendering of the unicode characters (#134695)

    * [Doc] Fix rendering of the unicode characters (#134597)

    https://github.com/pytorch/pytorch/pull/124771 introduced unicode escape sequences inside raw strings, which were not rendered correctly. Also fix typo in `\uue0 ` escape sequence (should have been `\u00e0`)
    Fix it by relying on [string literal concatenation](https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation) to join raw and regular strings together during lexical analysis stage

    Fixes https://github.com/pytorch/pytorch/issues/134422

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/134597
    Approved by: https://github.com/aorenste, https://github.com/Skylion007

    (cherry picked from commit 534f43ddce24ab6bafa3aed42ee3d68947073d3f)

    * Fix lint

    ---------

    Co-authored-by: Nikita Shulga <nshulga@meta.com>

build log:

Building wheel torch-2.4.1
-- Building version 2.4.1
cmake --build . --target install --config Release
[1/2055] Linking CXX shared library lib/libc10.so
FAILED: lib/libc10.so
: && /usr/bin/c++ -fPIC -ffunction-sections -fdata-sections -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG  -T/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld -Wl,--no-as-needed  -T/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld -rdynamic   -Wl,--no-as-needed -shared -Wl,-soname,libc10.so -o lib/libc10.so c10/CMakeFiles/c10.dir/core/Allocator.cpp.o c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o c10/CMakeFiles/c10.dir/core/Device.cpp.o c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o c10/CMakeFiles/c10.dir/core/GradMode.cpp.o c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o c10/CMakeFiles/c10.dir/core/Scalar.cpp.o c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o c10/CMakeFiles/c10.dir/core/Storage.cpp.o c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o c10/CMakeFiles/c10.dir/core/Stream.cpp.o c10/CMakeFiles/c10.dir/core/SymBool.cpp.o c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o c10/CMakeFiles/c10.dir/core/SymInt.cpp.o c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o c10/CMakeFiles/c10.dir/util/C++17.cpp.o c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o c10/CMakeFiles/c10.dir/util/Exception.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Half.cpp.o c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o c10/CMakeFiles/c10.dir/util/Logging.cpp.o c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o c10/CMakeFiles/c10.dir/util/Optional.cpp.o c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o c10/CMakeFiles/c10.dir/util/TypeList.cpp.o c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Unicode.cpp.o c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o c10/CMakeFiles/c10.dir/util/complex_math.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o c10/CMakeFiles/c10.dir/util/int128.cpp.o c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o c10/CMakeFiles/c10.dir/util/numa.cpp.o c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o c10/CMakeFiles/c10.dir/util/tempfile.cpp.o c10/CMakeFiles/c10.dir/util/thread_name.cpp.o c10/CMakeFiles/c10.dir/util/typeid.cpp.o  -Wl,-rpath,:::::::  /usr/lib/aarch64-linux-gnu/libnuma.so  lib/libcpuinfo.a  -pthread && /usr/local/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=lib/libc10.so && :
/usr/bin/ld: error: linker script file '/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld' appears multiple times
collect2: error: ld returned 1 exit status
[8/2055] Building CXX object c10/test/CMakeFiles/c10_LeftRight_test.dir/util/LeftRight_test.cpp.o
ninja: build stopped: subcommand failed.

AastaLLL · December 25, 2024, 3:14am

Hi,

Since PyTorch is a third-party library, they will know better about the compatibility.

We have PyTorch prebuilt packages for JetPack 5. environment until v2.1.
You can find it here: PyTorch for Jetson
You can also get a newer PyTorch if the latest JetPack version is used.

Thanks.

daniel2008_12 · December 25, 2024, 3:48am

Yes, I know, there is also nvidia-cuda-support on pytorch and points to select a supported version of CUDA from our support matrix, which beyond jetpack 5.1.4.

That’s why I create this topic for Jetpack 5.1.4 L4T35.6 (ubuntu 20.4) long term support.

Does NVIDIA still support Jetpack5 for pytorch above v2.1?

I saw python 3.8 should support up to pytorch v2.4.x, but apparently, there is no or very little resouce on pythorch over v2.1 for Jetpack 5 support.

Any info or plan?

PS: As jetpack 5 support ROS and jetpack 6 support ROS2.

daniel2008_12 · December 28, 2024, 8:41am

pytorch v2.3.1 build failed - CUDA kernel function #143935

daniel2008_12 · December 28, 2024, 11:40pm

@AastaLLL

Any idea about “cub bundled with your Cuda toolkit is too old.”? Is there any plan for jetpack 5.1.5 or some way to upgrade cuda toolkit on jetpack 5.1.4?

AastaLLL · December 30, 2024, 5:17am

Hi,

We release PyTorch prebuilt so users can pick it up directly without building on their own.
But since PyTorch is a third-party library, you will need to check with them to see if they can support the combination you need.

It looks like you have many issues with PyTorch.
How about trying our TensorRT library which can work with JetPack 5 without building it from the source?
TensorRT has optimized for the Jetson device and you can get it through JetPack directly.

Thanks.

daniel2008_12 · December 31, 2024, 12:56am

Well, it seems OK. Pytorch team are working on some of the issues now.

And as you can see in the previous threads, there is a significant progress on build on jetson orin nano 8GB board.

PS: Anyone can try to build the code if you want, just follow my setup/build process/patches for the code here: Linux 35.6 + JetPack v5.1.4之编译 pytorch

Is there any document or advice on how to use TensorRT for object detection using yolo or other kinds of models?

Currently, we are trying different kinds of tech to evaluate the opensource on jetson orin nano (edge compute)?

Especially small multi-object detection/tracking, please help me if you have any kinds of info.

AastaLLL · January 2, 2025, 4:22am

Hi,

Yes, you can find a YOLO example in our TensorRT GitHub:

There are many tutorials (for newer YOLO versions) shared by the community.
Please try to search them online as we cannot share a third-party link here.

Thanks.

daniel2008_12 · January 5, 2025, 1:14pm

Thanks.

Linux 35.6 + JetPack v5.1.4之编译 pytorch

Topic		Replies	Views
Installing pytorch - /usr/local/cuda/lib64/libcudnn.so: error adding symbols: File in wrong format collect2: error: ld returned 1 exit status Jetson TX2 pytorch	20	5398	March 11, 2022
Hello AI World for Jetpack 6.0 DP - Pytorch 2.1.0 Installed, Torchvision Did Not Install Jetson Orin Nano pytorch	24	1840	January 15, 2024
PyTorch Install problem (Solved) Jetson AGX Xavier	35	31961	October 18, 2021
Cant install Pytorch on JetsonNano P3450 Jetson Nano pytorch	21	2505	August 16, 2023
Issues Installing pytorch through jetson inference install script Jetson Nano pytorch	7	1300	October 16, 2023
Pytorch and Python 3.8 on Jetson NX Jetson Xavier NX docker , yolo , pytorch	12	8255	October 10, 2021
Jetson AGX Xavier \| l4t-ml:r36.2.0-py3 \| Pytorch finds wrong Cuda version (7.2 instead of 12.2) Jetson AGX Xavier pytorch , generative_ai	11	1702	February 23, 2024
Pytorch & torchversion compatible issue on L4T35.5.0 Jetson Orin Nano pytorch	20	725	November 7, 2024
Cannot install PyTorch on jetson nano for python 3.9 Jetson Nano cuda	11	3992	August 4, 2023
Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "10.2") Jetson Nano cuda	16	9646	August 25, 2023

[REQUEST] build script for pytorch or up to date pytorh binary release supporting jetson boards running L4T35.6(ubuntu20.04)

Related topics