Xavier NX, cuda, torch and drivers

radsl · February 13, 2025, 3:37pm

I have a project using python 3.11/3.12 using TTS using torch, and need to make test with my jetson xavier nx.
since 5 days I didn’t succeeded to go beyond jeptack 35.6 including cuda 11.4, python 3.8.2.
so Is there a way to make the GPU available through a virtual python env? thanks

AastaLLL · February 14, 2025, 6:39am

Hi,

You will need to build it from the source for a custom Python version.

If you are looking for the building instructions, please check it below:
(you might need to update some steps for custom Python accordingly)

github.com/dusty-nv/jetson-containers

packages/pytorch/build.sh

master

#!/usr/bin/env bash
# Python builder
set -ex

echo "Building PyTorch ${PYTORCH_BUILD_VERSION}"
   
# build from source
git clone --branch "v${PYTORCH_BUILD_VERSION}" --depth=1 --recursive https://github.com/pytorch/pytorch /opt/pytorch ||
git clone --depth=1 --recursive https://github.com/pytorch/pytorch /opt/pytorch
cd /opt/pytorch

# https://github.com/pytorch/pytorch/issues/138333
CPUINFO_PATCH=third_party/cpuinfo/src/arm/linux/aarch64-isa.c
sed -i 's|cpuinfo_log_error|cpuinfo_log_warning|' ${CPUINFO_PATCH}
grep 'PR_SVE_GET_VL' ${CPUINFO_PATCH} || echo "patched ${CPUINFO_PATCH}"
tail -20 ${CPUINFO_PATCH}

pip3 install --no-cache-dir -r requirements.txt
pip3 install --no-cache-dir scikit-build ninja

This file has been truncated. show original

Thanks.

radsl · February 14, 2025, 2:26pm

thank you very much AastraLLL

radsl · February 24, 2025, 7:05pm

Argh! I have some issue to compile pytorch:(
FAILED: confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o
/usr/bin/cc -DFXDIV_USE_INLINE_ASSEMBLY=0 -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=1 -DXNN_ENABLE_ARM_SME=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=0 -DXNN_ENABLE_AVX256VNNI=0 -DXNN_ENABLE_AVX256VNNIGFNI=0 -DXNN_ENABLE_AVX512AMX=0 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=0 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=0 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/home/workmin/repos/pytorch/third_party/XNNPACK/include -I/home/workmin/repos/pytorch/third_party/XNNPACK/src -I/home/workmin/repos/pytorch/third_party/pthreadpool/include -I/home/workmin/repos/pytorch/third_party/FXdiv/include -isystem /home/workmin/repos/pytorch/third_party/protobuf/src -O3 -DNDEBUG -std=c99 -fPIC -march=native -Wno-psabi -O2 -pthread -fno-math-errno -march=armv8.2-a+sve+sve2 -MD -MT confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o -MF confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o.d -o confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o -c /home/workmin/repos/pytorch/third_party/XNNPACK/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c
cc1: error: invalid feature modifier ‘sve2’ in ‘-march=armv8.2-a+sve+sve2’
cc1: note: valid arguments are: fp simd crypto crc lse fp16 rcpc rdma dotprod aes sha2 sha3 sm4 fp16fml sve profile rng memtag sb ssbs predres; did you mean ‘sve’?
[724/6157] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-vunary/gen/f32-vsqr-neon.c.o

any clue?
thanks

AastaLLL · March 13, 2025, 3:57am

Hi,

The error relates to XNNPACK.
Based on their document, Jetson (L4T) is listed in the supported architectures:

Do you need the backend?
If not, maybe you can disable it and rebuild PyTorch again.

$ USE_XNNPACK=0
...

Thanks.

radsl · March 13, 2025, 1:10pm

I think I do not need the backend. it’s for use the GPU for a TTS text to speech (bark, coqui-tts etc…)

radsl · March 13, 2025, 1:38pm

in fact what I need is very simple:
pytorch, torchaudio, torchvision working with cuda 11.4 and python 3.12
to let me work on my TTS project.

radsl · March 13, 2025, 9:22pm

still failing after many attempts. here is what I did:

export PYTORCH_BUILD_VERSION=2.4.1
export PYTORCH_BUILD_NUMBER=1
export MAX_JOBS=$(nproc)
export USE_CUDA=1
export USE_CUDNN=1
export CUDA_HOME=/usr/local/cuda
export TORCH_CUDA_ARCH_LIST=“5.3;6.2;7.2” # Xavier NX uses sm_72
export USE_NCCL=1
export USE_SYSTEM_NCCL=1
export BUILD_TEST=0
export USE_XNNPACK=0
export USE_PYTORCH_QNNPACK=0
export USE_PYTORCH_MOBILE=0
export USE_DISTRIBUTED=0
export USE_TENSORRT=1
export USE_FBGEMM=0
export USE_KINETO=0
export USE_MKLDNN=0
export USE_QNNPACK=0
export USE_SYSTEM_NCCL=1
export NCCL_ROOT_DIR=/usr/local
export PYTHON_EXECUTABLE=$(which python3)
export CMAKE_PREFIX_PATH=${NCCL_ROOT_DIR}:${CMAKE_PREFIX_PATH}
export USE_FBGEMM=0
export USE_FAKELOWP=0
export USE_PRIORITIZED_TEXT_FOR_LD=1
export USE_PYTORCH_MOBILE=0
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

git clone --branch v2.4.1 GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
cd pytorch
git submodule sync
git submodule update --init --recursive
cmake -S . -B build
-GNinja
-DCMAKE_BUILD_TYPE=Release
-DUSE_CUDA=ON
-DUSE_NCCL=ON
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
-DCMAKE_CXX_COMPILER=/usr/bin/g++
-DCMAKE_C_COMPILER=/usr/bin/gcc
-DUSE_DISTRIBUTED=OFF
-DCMAKE_PREFIX_PATH=$(python -c “import sys; print(sys.prefix)”)
-DUSE_NNPACK=OFF
-DUSE_TENSORRT=ON
-DUSE_SYSTEM_NCCL=ON
-DBLAS=OpenBLAS
-DOpenBLAS_INCLUDE_DIR=/usr/include/openblas
-DOpenBLAS_LIB=/usr/lib/aarch64-linux-gnu/libopenblas.so.0
-DPYTHON_EXECUTABLE=$(which python)
cd build
ninja
cd …
python setup.py clean
python setup.py bdist_wheel

FAILED: lib/libc10.so
: && /usr/bin/c++ -fPIC -ffunction-sections -fdata-sections -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -T/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld -Wl,–no-as-needed -T/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld -rdynamic -Wl,–no-as-needed -shared -Wl,-soname,libc10.so -o lib/libc10.so c10/CMakeFiles/c10.dir/core/Allocator.cpp.o c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o c10/CMakeFiles/c10.dir/core/Device.cpp.o c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o c10/CMakeFiles/c10.dir/core/GradMode.cpp.o c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o c10/CMakeFiles/c10.dir/core/Scalar.cpp.o c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o c10/CMakeFiles/c10.dir/core/Storage.cpp.o c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o c10/CMakeFiles/c10.dir/core/Stream.cpp.o c10/CMakeFiles/c10.dir/core/SymBool.cpp.o c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o c10/CMakeFiles/c10.dir/core/SymInt.cpp.o c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o c10/CMakeFiles/c10.dir/util/C++17.cpp.o c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o c10/CMakeFiles/c10.dir/util/Exception.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Half.cpp.o c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o c10/CMakeFiles/c10.dir/util/Logging.cpp.o c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o c10/CMakeFiles/c10.dir/util/Optional.cpp.o c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o c10/CMakeFiles/c10.dir/util/TypeList.cpp.o c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Unicode.cpp.o c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o c10/CMakeFiles/c10.dir/util/complex_math.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o c10/CMakeFiles/c10.dir/util/int128.cpp.o c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o c10/CMakeFiles/c10.dir/util/numa.cpp.o c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o c10/CMakeFiles/c10.dir/util/tempfile.cpp.o c10/CMakeFiles/c10.dir/util/thread_name.cpp.o c10/CMakeFiles/c10.dir/util/typeid.cpp.o -Wl,-rpath,::::::: /usr/lib/aarch64-linux-gnu/libnuma.so lib/libcpuinfo.a -pthread && /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E __run_co_compile --lwyu=“ldd;-u;-r” --source=lib/libc10.so && :
/usr/bin/ld: error: linker script file ‘/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld’ appears multiple times
collect2: error: ld returned 1 exit status

any clue?

AastaLLL · March 17, 2025, 6:45am

Hi,

We have prebuilt packages but only with the default Python 3.8:

About the link error, could you try to build it with a single thread to see if it helps?

Thanks.

radsl · March 17, 2025, 11:55am

I know! but I’m using some TTS libraries supporting python 3.12 minimum!

AastaLLL · March 19, 2025, 7:53am

Hi,

When checking the L4T-R35.4.1 branch, there is a configuration that might relate to your error.
Could you turn it off and try it again?

github.com/dusty-nv/jetson-containers

packages/pytorch/Dockerfile

L4T-R35.4.1


      
          RUN cd /opt && \
              wget --quiet --show-progress --progress=bar:force:noscroll --no-check-certificate ${PYTORCH_URL} -O ${PYTORCH_WHL} && \
              pip3 install --verbose ${PYTORCH_WHL}
          
          RUN python3 -c 'import torch; print(f"PyTorch version: {torch.__version__}"); print(f"CUDA available:  {torch.cuda.is_available()}"); print(f"cuDNN version:   {torch.backends.cudnn.version()}"); print(torch.__config__.show());'
          
          # patch for https://github.com/pytorch/pytorch/issues/45323
          RUN PYTHON_ROOT=`pip3 show torch | grep Location: | cut -d' ' -f2` && \
              TORCH_CMAKE_CONFIG=$PYTHON_ROOT/torch/share/cmake/Torch/TorchConfig.cmake && \
              echo "patching _GLIBCXX_USE_CXX11_ABI in ${TORCH_CMAKE_CONFIG}" && \
              sed -i 's/  set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=")/  set(TORCH_CXX_FLAGS "-D_GLIBCXX_USE_CXX11_ABI=0")/g' ${TORCH_CMAKE_CONFIG}
          
          # PyTorch C++ extensions frequently use ninja parallel builds
          RUN pip3 install --no-cache-dir scikit-build && \
              pip3 install --no-cache-dir ninja
              
          # set the torch hub model cache directory to mounted /data volume
          ENV TORCH_HOME=/data/models/torch

Thanks.

radsl · March 19, 2025, 1:32pm

ok I’m going to try it after my last current try which is not finished… thanks

Topic		Replies	Views
Right version of torch for python 3.10.14 Jetson Xavier NX pytorch	9	1853	July 31, 2024
Need suggestion about pytorch compilation on jetson xavier nx Jetson Xavier NX cuda , pytorch	5	60	November 17, 2025
Libcublas.so.11 not found when working with PyTorch Jetson Xavier NX pytorch	8	8950	March 8, 2023
PyTorch Install problem (Solved) Jetson AGX Xavier	35	32429	October 18, 2021
Jetson TX2: Pytorch install problem Jetson TX2	16	8201	October 18, 2021
Torch not compiled with cuda enabled over Jetson Xavier Nx Jetson Xavier NX pytorch	2	3683	October 18, 2021
Unable to install pytorch with cuda support on jetson AGX Xavier Jetson AGX Xavier cuda , pytorch , python , platform-jetson	2	1231	January 22, 2024
Which nvidia packages are needed to run pytorch? Jetson Xavier NX pytorch	5	1722	July 5, 2023
Install PyTorch with Python 3.8 on Jetpack 4.4.1 Jetson TX2 pytorch , python	14	18248	July 14, 2021
Pytorch and Python 3.8 on Jetson NX Jetson Xavier NX docker , yolo , pytorch	12	8522	October 10, 2021

Xavier NX, cuda, torch and drivers

Related topics