[REQUEST] build script for pytorch or up to date pytorh binary release supporting jetson boards running L4T35.6(ubuntu20.04)

Since the latest of pytorch version is 2.5, maybe > 2.5 when you reading this.

The Jetpack 5.1.3/5.1.4 based on ubntu 20.04 only supports torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl.

Can NVIDIA support latest build or share build script for us to compile the latest up to date pytorch version?

Hi,

The latest PyTorch is built with the latest JetPack which is 6.1 currently.
For building command, you can find it in the link below:

Thanks

@AastaLLL I need latest build for Jetpack 5.1.4 (L4T35.6) , not Jetpack 6.x. Thanks.

Do you have PyTorch v2.3.0 for Jetpack 5.1.4 (L4T35.6) ?

EDIT: Currently, we have 2.1.0a0+41361538.nv23.06 for Jetpack 5.1.4 (L4T35.6), which is based on ubuntu20.04. I don’t think PyTorch v2.3.0 for Jetpack 6.0 (L4T 36.2), which is based on ubutnu22.04 can be installed properly on my current env.

Software part of jetson-stats 4.2.12 - (c) 2024, Raffaello Bonghi
Model: NVIDIA Orin Nano Developer Kit - Jetpack 5.1.4 [L4T 35.6.0]
NV Power Mode[0]: 15W
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3767-0005
 - Module: NVIDIA Jetson Orin Nano (Developer kit)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.216-tegra
jtop:
 - Version: 4.2.12
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 8.5.2.2
 - VPI: 2.4.8
 - OpenCV: 4.9.0 - with CUDA: YES
DeepStream C/C++ SDK version: 6.3

Python Environment:
Python 3.8.10
    GStreamer:                   YES (1.16.3)
  NVIDIA CUDA:                   YES (ver 11.4, CUFFT CUBLAS FAST_MATH)
        OpenCV version: 4.9.0  CUDA True
          YOLO version: 8.3.33
         Torch version: 2.1.0a0+41361538.nv23.06
   Torchvision version: 0.16.1+fdea156
DeepStream SDK version: 1.1.8

EDIT2: And I haven’t see any EOL announcement of jetpack 5 yet, so I made two choices above for the long term support of Jetpack 5.x series.

EDIT3: @AastaLLL I didn’t find building command in the link you provided above. I only find wheel installation commands. What I mean is build from code, If NVIDIA has it’s private code then it will provide libs which would link to the opensource.

Hi,

Please check Instructions → Build from Source.
Thanks.

1 Like

@AastaLLL Oh sorry, I didn’t notice this. That’s Great!

Thank you very much!

PS: We have met some version compatibility issue here: Yolov8s no bounding box on default settings · Issue #597 · marcoslucianops/DeepStream-Yolo · GitHub . So it’s good to have a backup plan.

Continuing the discussion from PyTorch for Jetson:

@AastaLLL

I checkout latest 2.5.1 for my jetpack 5.1.4, and follow the step below, does my TORCH_CUDA_ARCH_LIST set wrong, or something not properly configured causing config error?

git clone --recursive --branch v2.5.1 git@github.com:pytorch/pytorch.git
export USE_NCCL=0
export USE_DISTRIBUTED=0
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="7.2;8.7"
export PYTORCH_BUILD_VERSION=2.5.1
export PYTORCH_BUILD_NUMBER=1
cd pytorch/

sudo apt-get install python3-pip cmake libopenblas-dev libopenmpi-dev 

pip3 install -r requirements.txt
pip3 install scikit-build
pip3 install ninja


daniel@daniel-nvidia:~/Work/pytorch$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-------------------------------------------------------------------------------------------------
|                                                                                               |
|            WARNING: we strongly recommend enabling linker script optimization for ARM + CUDA. |
|            To do so please export USE_PRIORITIZED_TEXT_FOR_LD=1                               |
|                                                                                               |
-------------------------------------------------------------------------------------------------
-- Building version 2.5.1
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/daniel/Work/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3.8/site-packages -DPython_EXECUTABLE=/usr/bin/python3 -DTORCH_BUILD_VERSION=2.5.1 -DTORCH_CUDA_ARCH_LIST=7.2;8.7 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/daniel/Work/pytorch
-- The CXX compiler identification is GNU 9.4.0
-- The C compiler identification is GNU 9.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- /usr/bin/c++ /home/daniel/Work/pytorch/torch/abi-check.cpp -o /home/daniel/Work/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
-- Not forcing any particular BLAS to be found
-- Performing Test C_HAS_AVX_1
-- Performing Test C_HAS_AVX_1 - Failed
-- Performing Test C_HAS_AVX_2
-- Performing Test C_HAS_AVX_2 - Failed
-- Performing Test C_HAS_AVX_3
-- Performing Test C_HAS_AVX_3 - Failed
-- Performing Test C_HAS_AVX2_1
-- Performing Test C_HAS_AVX2_1 - Failed
-- Performing Test C_HAS_AVX2_2
-- Performing Test C_HAS_AVX2_2 - Failed
-- Performing Test C_HAS_AVX2_3
-- Performing Test C_HAS_AVX2_3 - Failed
-- Performing Test C_HAS_AVX512_1
-- Performing Test C_HAS_AVX512_1 - Failed
-- Performing Test C_HAS_AVX512_2
-- Performing Test C_HAS_AVX512_2 - Failed
-- Performing Test C_HAS_AVX512_3
-- Performing Test C_HAS_AVX512_3 - Failed
-- Performing Test CXX_HAS_AVX_1
-- Performing Test CXX_HAS_AVX_1 - Failed
-- Performing Test CXX_HAS_AVX_2
-- Performing Test CXX_HAS_AVX_2 - Failed
-- Performing Test CXX_HAS_AVX_3
-- Performing Test CXX_HAS_AVX_3 - Failed
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Failed
-- Performing Test CXX_HAS_AVX2_3
-- Performing Test CXX_HAS_AVX2_3 - Failed
-- Performing Test CXX_HAS_AVX512_1
-- Performing Test CXX_HAS_AVX512_1 - Failed
-- Performing Test CXX_HAS_AVX512_2
-- Performing Test CXX_HAS_AVX512_2 - Failed
-- Performing Test CXX_HAS_AVX512_3
-- Performing Test CXX_HAS_AVX512_3 - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Success
-- Found CUDA: /usr/local/cuda (found version "11.4")
-- The CUDA compiler identification is unknown
CMake Error at /usr/local/share/cmake-3.31/Modules/CMakeDetermineCUDACompiler.cmake:266 (message):
  Failed to detect a default CUDA architecture.



  Compiler output:

Call Stack (most recent call first):
  cmake/public/cuda.cmake:47 (enable_language)
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:863 (include)


-- Configuring incomplete, errors occurred!

EDIT: export USE_PRIORITIZED_TEXT_FOR_LD=1

daniel@daniel-nvidia:~/Work/pytorch$ export USE_PRIORITIZED_TEXT_FOR_LD=1
daniel@daniel-nvidia:~/Work/pytorch$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-- Building version 2.5.1
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/daniel/Work/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3.8/site-packages -DPython_EXECUTABLE=/usr/bin/python3 -DTORCH_BUILD_VERSION=2.5.1 -DTORCH_CUDA_ARCH_LIST=7.2;8.7 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PRIORITIZED_TEXT_FOR_LD=1 -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/daniel/Work/pytorch
-- /usr/bin/c++ /home/daniel/Work/pytorch/torch/abi-check.cpp -o /home/daniel/Work/pytorch/build/abi-check
-- Determined _GLIBCXX_USE_CXX11_ABI=1
CMake Error at /usr/local/share/cmake-3.31/Modules/Internal/CMakeCUDAArchitecturesValidate.cmake:7 (message):
  CMAKE_CUDA_ARCHITECTURES must be non-empty if set.
Call Stack (most recent call first):
  /usr/local/share/cmake-3.31/Modules/CMakeDetermineCUDACompiler.cmake:112 (cmake_cuda_architectures_validate)
  cmake/public/cuda.cmake:47 (enable_language)
  cmake/Dependencies.cmake:44 (include)
  CMakeLists.txt:863 (include)


-- Configuring incomplete, errors occurred!

EDIT2: It seems something wrong with cmake. Any idea or tips for fix this?

Hi,

GPU architecture for the Orin series is 8.7.
Using either export TORCH_CUDA_ARCH_LIST="7.2;8.7" or export TORCH_CUDA_ARCH_LIST="8.7" are good.

Based on this error: The CUDA compiler identification is unknown
Could you verify if the nvcc is added to the environment variable?

$ nvcc --version

If the command returns not found error, please run the below to export CUDA to the environment variables.

$ export PATH=/usr/local/cuda-12.2/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH

Once you can get the nvcc info successfully, please try the PyTorch building commands again.
Thanks.

It seems jetson orin runs out of memory???

EDIT: I run python3 setup.py bdist_wheel multiple times, all end up here:

$ export PATH=/usr/local/cuda/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
$ python3 setup.py bdist_wheel
Building wheel torch-2.5.1
-- Building version 2.5.1
cmake --build . --target install --config Release
[1/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/daniel/Work/pytorch/build/aten/src -I/home/daniel/Work/pytorch/aten/src -I/home/daniel/Work/pytorch/build -I/home/daniel/Work/pytorch -I/home/daniel/Work/pytorch/cmake/../third_party/benchmark/include -I/home/daniel/Work/pytorch/third_party/onnx -I/home/daniel/Work/pytorch/build/third_party/onnx -I/home/daniel/Work/pytorch/nlohmann -I/home/daniel/Work/pytorch/torch/csrc/api -I/home/daniel/Work/pytorch/torch/csrc/api/include -I/home/daniel/Work/pytorch/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src -I/home/daniel/Work/pytorch/build/caffe2/../aten/src -I/home/daniel/Work/pytorch/torch/csrc -I/home/daniel/Work/pytorch/third_party/miniz-2.1.0 -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/include -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/src -I/home/daniel/Work/pytorch/third_party/cpp-httplib -I/home/daniel/Work/pytorch/aten/src/ATen/.. -I/home/daniel/Work/pytorch/third_party/FXdiv/include -I/home/daniel/Work/pytorch/c10/.. -I/home/daniel/Work/pytorch/third_party/pthreadpool/include -I/home/daniel/Work/pytorch/third_party/cpuinfo/include -I/home/daniel/Work/pytorch/third_party/NNPACK/include -I/home/daniel/Work/pytorch/third_party/FP16/include -I/home/daniel/Work/pytorch/third_party/fmt/include -I/home/daniel/Work/pytorch/third_party/flatbuffers/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/daniel/Work/pytorch/third_party/protobuf/src -isystem /home/daniel/Work/pytorch/third_party/XNNPACK/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/daniel/Work/pytorch/INTERFACE -isystem /home/daniel/Work/pytorch/third_party/nlohmann/include -isystem /home/daniel/Work/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterCPU.cpp.o -c /home/daniel/Work/pytorch/build/aten/src/ATen/RegisterCPU.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[2/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFLASHATTENTION_DISABLE_ALIBI -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/daniel/Work/pytorch/build/aten/src -I/home/daniel/Work/pytorch/aten/src -I/home/daniel/Work/pytorch/build -I/home/daniel/Work/pytorch -I/home/daniel/Work/pytorch/cmake/../third_party/benchmark/include -I/home/daniel/Work/pytorch/third_party/onnx -I/home/daniel/Work/pytorch/build/third_party/onnx -I/home/daniel/Work/pytorch/nlohmann -I/home/daniel/Work/pytorch/torch/csrc/api -I/home/daniel/Work/pytorch/torch/csrc/api/include -I/home/daniel/Work/pytorch/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src/TH -I/home/daniel/Work/pytorch/build/caffe2/aten/src -I/home/daniel/Work/pytorch/build/caffe2/../aten/src -I/home/daniel/Work/pytorch/torch/csrc -I/home/daniel/Work/pytorch/third_party/miniz-2.1.0 -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/include -I/home/daniel/Work/pytorch/third_party/kineto/libkineto/src -I/home/daniel/Work/pytorch/third_party/cpp-httplib -I/home/daniel/Work/pytorch/aten/src/ATen/.. -I/home/daniel/Work/pytorch/third_party/FXdiv/include -I/home/daniel/Work/pytorch/c10/.. -I/home/daniel/Work/pytorch/third_party/pthreadpool/include -I/home/daniel/Work/pytorch/third_party/cpuinfo/include -I/home/daniel/Work/pytorch/third_party/NNPACK/include -I/home/daniel/Work/pytorch/third_party/FP16/include -I/home/daniel/Work/pytorch/third_party/fmt/include -I/home/daniel/Work/pytorch/third_party/flatbuffers/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/daniel/Work/pytorch/third_party/protobuf/src -isystem /home/daniel/Work/pytorch/third_party/XNNPACK/include -isystem /home/daniel/Work/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/daniel/Work/pytorch/INTERFACE -isystem /home/daniel/Work/pytorch/third_party/nlohmann/include -isystem /home/daniel/Work/pytorch/build/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wunused-function -Wunused-variable -Wunused-but-set-variable -Wno-maybe-uninitialized -fvisibility=hidden -O2 -pthread -fopenmp -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_2.cpp.o -c /home/daniel/Work/pytorch/build/aten/src/ATen/Operators_2.cpp
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[8/1482] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/Operators_1.cpp.o
ninja: build stopped: subcommand failed.

EDIT2: /usr/local/cuda is a softlink.

daniel@daniel-nvidia:~/Work/pytorch$ ls -l /usr/local/cuda
cuda/      cuda-11/   cuda-11.4/
daniel@daniel-nvidia:~/Work/pytorch$ ls -l /usr/local/cuda*
lrwxrwxrwx  1 root root   22 11月  7 09:03 /usr/local/cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 11月  7 09:03 /usr/local/cuda-11 -> /etc/alternatives/cuda-11

/usr/local/cuda-11.4:
total 112
drwxr-xr-x  3 root root  4096 11月  7 09:00 bin
drwxr-xr-x  4 root root  4096 11月  7 09:00 compute-sanitizer
-rw-r--r--  1 root root   160 9月  14  2022 DOCS
-rw-r--r--  1 root root 61727 9月  14  2022 EULA.txt
drwxr-xr-x  4 root root  4096 11月  7 09:00 extras
lrwxrwxrwx  1 root root    29 9月  19  2022 include -> targets/aarch64-linux/include
lrwxrwxrwx  1 root root    25 9月  14  2022 lib64 -> targets/aarch64-linux/lib
drwxr-xr-x  3 root root  4096 11月  7 09:00 nvml
drwxr-xr-x  7 root root  4096 11月  7 08:59 nvvm
-rw-r--r--  1 root root   524 9月  14  2022 README
drwxr-xr-x 11 root root  4096 11月  7 09:00 samples
drwxr-xr-x  3 root root  4096 11月  7 09:00 share
drwxr-xr-x  3 root root  4096 11月  7 08:58 targets
drwxr-xr-x  2 root root  4096 11月  7 09:00 tools
-rw-r--r--  1 root root  2127 10月 25  2022 version.json

EDIT3: $ nvcc --version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Sun_Oct_23_22:16:07_PDT_2022
Cuda compilation tools, release 11.4, V11.4.315
Build cuda_11.4.r11.4/compiler.31964100_0

With countless(20 more???) build, end up below error, so what version is suggested for Jetpack 5.1.4?

daniel@daniel-nvidia:~/Work/pytorch$ git log -n 1
commit a8d6afb511a69687bbb2b7e88a3cf67917e1697e (HEAD, tag: v2.5.1-rc1, tag: v2.5.1, origin/release/2.5)
Author: pytorchbot <soumith+bot@pytorch.org>
Date:   Tue Oct 22 18:14:52 2024 -0700

    Disabling amp context when invoking compiler (#138659)

    Disabling amp context when invoking compiler (#138624)

    Fix for https://github.com/pytorch/pytorch/issues/133974

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/138624
    Approved by: https://github.com/bdhirsh, https://github.com/drisspg

    (cherry picked from commit 5942b2985000e0c69ec955b6c88dee8b5d7e67fd)

    Co-authored-by: eellison <elias.ellison@gmail.com>

Failed log attached here:
log.txt (126.5 KB)


EDIT: Here is a link in Pytorch issue ticket (Hope to get more help on this): pytorch v2.5.1 build for nvidia jetson orin 8GB · Issue #143624 · pytorch/pytorch · GitHub

Hi,

Based on their document, it looks like you need Python >= 3.9 and CUDA >=11.8 to build PyTorch 2.5.x.
Maybe PyTorch team can share more info about their dependences.

Thanks.

@AastaLLL Thanks again.

I will need NVIDIA’s expertise to help me check which version is OK on my current (latest ubuntu20.4) Jetpack 5.1.4 system with CUDA: 11.4.315 cuDNN: 8.6.0.166 Python 3.8.10 ?

The above issue is caused by python3.8, if I understand correctly??? But none of the suggested version matchs jetpack 5.1.4.

Now, what would be suitable for me to compile? how about 2.4??? will cuda or cudnn fail?

v2.4.1 still failed - pytorch v2.4.1 build for nvidia jetson orin nano 8GB #143816

$ git log -n 1
commit ee1b6804381c57161c477caa380a840a84167676 (HEAD, tag: v2.4.1, origin/release/2.4)
Author: pytorchbot <soumith+bot@pytorch.org>
Date:   Wed Aug 28 17:25:42 2024 -0700

    [Doc] Fix rendering of the unicode characters (#134695)

    * [Doc] Fix rendering of the unicode characters (#134597)

    https://github.com/pytorch/pytorch/pull/124771 introduced unicode escape sequences inside raw strings, which were not rendered correctly. Also fix typo in `\uue0 ` escape sequence (should have been `\u00e0`)
    Fix it by relying on [string literal concatenation](https://docs.python.org/3/reference/lexical_analysis.html#string-literal-concatenation) to join raw and regular strings together during lexical analysis stage

    Fixes https://github.com/pytorch/pytorch/issues/134422

    Pull Request resolved: https://github.com/pytorch/pytorch/pull/134597
    Approved by: https://github.com/aorenste, https://github.com/Skylion007

    (cherry picked from commit 534f43ddce24ab6bafa3aed42ee3d68947073d3f)

    * Fix lint

    ---------

    Co-authored-by: Nikita Shulga <nshulga@meta.com>

build log:

Building wheel torch-2.4.1
-- Building version 2.4.1
cmake --build . --target install --config Release
[1/2055] Linking CXX shared library lib/libc10.so
FAILED: lib/libc10.so
: && /usr/bin/c++ -fPIC -ffunction-sections -fdata-sections -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG  -T/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld -Wl,--no-as-needed  -T/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld -rdynamic   -Wl,--no-as-needed -shared -Wl,-soname,libc10.so -o lib/libc10.so c10/CMakeFiles/c10.dir/core/Allocator.cpp.o c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o c10/CMakeFiles/c10.dir/core/Device.cpp.o c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o c10/CMakeFiles/c10.dir/core/GradMode.cpp.o c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o c10/CMakeFiles/c10.dir/core/Scalar.cpp.o c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o c10/CMakeFiles/c10.dir/core/Storage.cpp.o c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o c10/CMakeFiles/c10.dir/core/Stream.cpp.o c10/CMakeFiles/c10.dir/core/SymBool.cpp.o c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o c10/CMakeFiles/c10.dir/core/SymInt.cpp.o c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o c10/CMakeFiles/c10.dir/util/C++17.cpp.o c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o c10/CMakeFiles/c10.dir/util/Exception.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Half.cpp.o c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o c10/CMakeFiles/c10.dir/util/Logging.cpp.o c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o c10/CMakeFiles/c10.dir/util/Optional.cpp.o c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o c10/CMakeFiles/c10.dir/util/TypeList.cpp.o c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Unicode.cpp.o c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o c10/CMakeFiles/c10.dir/util/complex_math.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o c10/CMakeFiles/c10.dir/util/int128.cpp.o c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o c10/CMakeFiles/c10.dir/util/numa.cpp.o c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o c10/CMakeFiles/c10.dir/util/tempfile.cpp.o c10/CMakeFiles/c10.dir/util/thread_name.cpp.o c10/CMakeFiles/c10.dir/util/typeid.cpp.o  -Wl,-rpath,:::::::  /usr/lib/aarch64-linux-gnu/libnuma.so  lib/libcpuinfo.a  -pthread && /usr/local/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=lib/libc10.so && :
/usr/bin/ld: error: linker script file '/home/daniel/Work/pytorch_v2.4.1/cmake/linker_script.ld' appears multiple times
collect2: error: ld returned 1 exit status
[8/2055] Building CXX object c10/test/CMakeFiles/c10_LeftRight_test.dir/util/LeftRight_test.cpp.o
ninja: build stopped: subcommand failed.

Hi,

Since PyTorch is a third-party library, they will know better about the compatibility.

We have PyTorch prebuilt packages for JetPack 5. environment until v2.1.
You can find it here: PyTorch for Jetson
You can also get a newer PyTorch if the latest JetPack version is used.

Thanks.

Yes, I know, there is also nvidia-cuda-support on pytorch and points to select a supported version of CUDA from our support matrix, which beyond jetpack 5.1.4.

That’s why I create this topic for Jetpack 5.1.4 L4T35.6 (ubuntu 20.4) long term support.

Does NVIDIA still support Jetpack5 for pytorch above v2.1?

  • I saw python 3.8 should support up to pytorch v2.4.x, but apparently, there is no or very little resouce on pythorch over v2.1 for Jetpack 5 support.

Any info or plan?

PS: As jetpack 5 support ROS and jetpack 6 support ROS2.

pytorch v2.3.1 build failed - CUDA kernel function #143935

@AastaLLL

Any idea about “cub bundled with your Cuda toolkit is too old.”? Is there any plan for jetpack 5.1.5 or some way to upgrade cuda toolkit on jetpack 5.1.4?

Hi,

We release PyTorch prebuilt so users can pick it up directly without building on their own.
But since PyTorch is a third-party library, you will need to check with them to see if they can support the combination you need.

It looks like you have many issues with PyTorch.
How about trying our TensorRT library which can work with JetPack 5 without building it from the source?
TensorRT has optimized for the Jetson device and you can get it through JetPack directly.

Thanks.

Well, it seems OK. Pytorch team are working on some of the issues now.

And as you can see in the previous threads, there is a significant progress on build on jetson orin nano 8GB board.

PS: Anyone can try to build the code if you want, just follow my setup/build process/patches for the code here: Linux 35.6 + JetPack v5.1.4之编译 pytorch

Is there any document or advice on how to use TensorRT for object detection using yolo or other kinds of models?

Currently, we are trying different kinds of tech to evaluate the opensource on jetson orin nano (edge compute)?

Especially small multi-object detection/tracking, please help me if you have any kinds of info.

Hi,

Yes, you can find a YOLO example in our TensorRT GitHub:

There are many tutorials (for newer YOLO versions) shared by the community.
Please try to search them online as we cannot share a third-party link here.

Thanks.

Thanks.

Linux 35.6 + JetPack v5.1.4之编译 pytorch