I have a project using python 3.11/3.12 using TTS using torch, and need to make test with my jetson xavier nx.
since 5 days I didn’t succeeded to go beyond jeptack 35.6 including cuda 11.4, python 3.8.2.
so Is there a way to make the GPU available through a virtual python env? thanks
Hi,
You will need to build it from the source for a custom Python version.
If you are looking for the building instructions, please check it below:
(you might need to update some steps for custom Python accordingly)
Thanks.
thank you very much AastraLLL
Argh! I have some issue to compile pytorch:(
FAILED: confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o
/usr/bin/cc -DFXDIV_USE_INLINE_ASSEMBLY=0 -DXNN_ENABLE_ARM_BF16=0 -DXNN_ENABLE_ARM_DOTPROD=1 -DXNN_ENABLE_ARM_FP16_SCALAR=1 -DXNN_ENABLE_ARM_FP16_VECTOR=1 -DXNN_ENABLE_ARM_I8MM=0 -DXNN_ENABLE_ARM_SME2=1 -DXNN_ENABLE_ARM_SME=1 -DXNN_ENABLE_ASSEMBLY=1 -DXNN_ENABLE_AVX256SKX=0 -DXNN_ENABLE_AVX256VNNI=0 -DXNN_ENABLE_AVX256VNNIGFNI=0 -DXNN_ENABLE_AVX512AMX=0 -DXNN_ENABLE_AVX512F=1 -DXNN_ENABLE_AVX512FP16=0 -DXNN_ENABLE_AVX512SKX=1 -DXNN_ENABLE_AVX512VBMI=1 -DXNN_ENABLE_AVX512VNNI=1 -DXNN_ENABLE_AVX512VNNIGFNI=1 -DXNN_ENABLE_AVXVNNI=0 -DXNN_ENABLE_AVXVNNIINT8=0 -DXNN_ENABLE_CPUINFO=1 -DXNN_ENABLE_DWCONV_MULTIPASS=0 -DXNN_ENABLE_GEMM_M_SPECIALIZATION=1 -DXNN_ENABLE_HVX=1 -DXNN_ENABLE_KLEIDIAI=0 -DXNN_ENABLE_MEMOPT=1 -DXNN_ENABLE_RISCV_VECTOR=1 -DXNN_ENABLE_SPARSE=1 -DXNN_ENABLE_VSX=1 -I/home/workmin/repos/pytorch/third_party/XNNPACK/include -I/home/workmin/repos/pytorch/third_party/XNNPACK/src -I/home/workmin/repos/pytorch/third_party/pthreadpool/include -I/home/workmin/repos/pytorch/third_party/FXdiv/include -isystem /home/workmin/repos/pytorch/third_party/protobuf/src -O3 -DNDEBUG -std=c99 -fPIC -march=native -Wno-psabi -O2 -pthread -fno-math-errno -march=armv8.2-a+sve+sve2 -MD -MT confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o -MF confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o.d -o confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c.o -c /home/workmin/repos/pytorch/third_party/XNNPACK/src/pf32-gemm/pf32-gemm-32x32-minmax-neonsme2.c
cc1: error: invalid feature modifier ‘sve2’ in ‘-march=armv8.2-a+sve+sve2’
cc1: note: valid arguments are: fp simd crypto crc lse fp16 rcpc rdma dotprod aes sha2 sha3 sm4 fp16fml sve profile rng memtag sb ssbs predres; did you mean ‘sve’?
[724/6157] Building C object confu-deps/XNNPACK/CMakeFiles/microkernels-prod.dir/src/f32-vunary/gen/f32-vsqr-neon.c.o
any clue?
thanks
Hi,
The error relates to XNNPACK.
Based on their document, Jetson (L4T) is listed in the supported architectures:
Do you need the backend?
If not, maybe you can disable it and rebuild PyTorch again.
$ USE_XNNPACK=0
...
Thanks.
I think I do not need the backend. it’s for use the GPU for a TTS text to speech (bark, coqui-tts etc…)
in fact what I need is very simple:
pytorch, torchaudio, torchvision working with cuda 11.4 and python 3.12
to let me work on my TTS project.
still failing after many attempts. here is what I did:
export PYTORCH_BUILD_VERSION=2.4.1
export PYTORCH_BUILD_NUMBER=1
export MAX_JOBS=$(nproc)
export USE_CUDA=1
export USE_CUDNN=1
export CUDA_HOME=/usr/local/cuda
export TORCH_CUDA_ARCH_LIST=“5.3;6.2;7.2” # Xavier NX uses sm_72
export USE_NCCL=1
export USE_SYSTEM_NCCL=1
export BUILD_TEST=0
export USE_XNNPACK=0
export USE_PYTORCH_QNNPACK=0
export USE_PYTORCH_MOBILE=0
export USE_DISTRIBUTED=0
export USE_TENSORRT=1
export USE_FBGEMM=0
export USE_KINETO=0
export USE_MKLDNN=0
export USE_QNNPACK=0
export USE_SYSTEM_NCCL=1
export NCCL_ROOT_DIR=/usr/local
export PYTHON_EXECUTABLE=$(which python3)
export CMAKE_PREFIX_PATH=${NCCL_ROOT_DIR}:${CMAKE_PREFIX_PATH}
export USE_FBGEMM=0
export USE_FAKELOWP=0
export USE_PRIORITIZED_TEXT_FOR_LD=1
export USE_PYTORCH_MOBILE=0
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
git clone --branch v2.4.1 GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
cd pytorch
git submodule sync
git submodule update --init --recursive
cmake -S . -B build
-GNinja
-DCMAKE_BUILD_TYPE=Release
-DUSE_CUDA=ON
-DUSE_NCCL=ON
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
-DCMAKE_CXX_COMPILER=/usr/bin/g++
-DCMAKE_C_COMPILER=/usr/bin/gcc
-DUSE_DISTRIBUTED=OFF
-DCMAKE_PREFIX_PATH=$(python -c “import sys; print(sys.prefix)”)
-DUSE_NNPACK=OFF
-DUSE_TENSORRT=ON
-DUSE_SYSTEM_NCCL=ON
-DBLAS=OpenBLAS
-DOpenBLAS_INCLUDE_DIR=/usr/include/openblas
-DOpenBLAS_LIB=/usr/lib/aarch64-linux-gnu/libopenblas.so.0
-DPYTHON_EXECUTABLE=$(which python)
cd build
ninja
cd …
python setup.py clean
python setup.py bdist_wheel
FAILED: lib/libc10.so
: && /usr/bin/c++ -fPIC -ffunction-sections -fdata-sections -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -T/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld -Wl,–no-as-needed -T/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld -rdynamic -Wl,–no-as-needed -shared -Wl,-soname,libc10.so -o lib/libc10.so c10/CMakeFiles/c10.dir/core/Allocator.cpp.o c10/CMakeFiles/c10.dir/core/AutogradState.cpp.o c10/CMakeFiles/c10.dir/core/CPUAllocator.cpp.o c10/CMakeFiles/c10.dir/core/ConstantSymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/CopyBytes.cpp.o c10/CMakeFiles/c10.dir/core/DefaultDtype.cpp.o c10/CMakeFiles/c10.dir/core/Device.cpp.o c10/CMakeFiles/c10.dir/core/DeviceType.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKey.cpp.o c10/CMakeFiles/c10.dir/core/DispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/GeneratorImpl.cpp.o c10/CMakeFiles/c10.dir/core/GradMode.cpp.o c10/CMakeFiles/c10.dir/core/InferenceMode.cpp.o c10/CMakeFiles/c10.dir/core/RefcountedDeleter.cpp.o c10/CMakeFiles/c10.dir/core/SafePyObject.cpp.o c10/CMakeFiles/c10.dir/core/Scalar.cpp.o c10/CMakeFiles/c10.dir/core/ScalarType.cpp.o c10/CMakeFiles/c10.dir/core/Storage.cpp.o c10/CMakeFiles/c10.dir/core/StorageImpl.cpp.o c10/CMakeFiles/c10.dir/core/Stream.cpp.o c10/CMakeFiles/c10.dir/core/SymBool.cpp.o c10/CMakeFiles/c10.dir/core/SymFloat.cpp.o c10/CMakeFiles/c10.dir/core/SymInt.cpp.o c10/CMakeFiles/c10.dir/core/SymIntArrayRef.cpp.o c10/CMakeFiles/c10.dir/core/SymNodeImpl.cpp.o c10/CMakeFiles/c10.dir/core/SymbolicShapeMeta.cpp.o c10/CMakeFiles/c10.dir/core/TensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/TensorOptions.cpp.o c10/CMakeFiles/c10.dir/core/UndefinedTensorImpl.cpp.o c10/CMakeFiles/c10.dir/core/WrapDimMinimal.cpp.o c10/CMakeFiles/c10.dir/core/impl/COW.cpp.o c10/CMakeFiles/c10.dir/core/impl/COWDeleter.cpp.o c10/CMakeFiles/c10.dir/core/impl/DeviceGuardImplInterface.cpp.o c10/CMakeFiles/c10.dir/core/impl/GPUTrace.cpp.o c10/CMakeFiles/c10.dir/core/impl/HermeticPyObjectTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/LocalDispatchKeySet.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyInterpreter.cpp.o c10/CMakeFiles/c10.dir/core/impl/PyObjectSlot.cpp.o c10/CMakeFiles/c10.dir/core/impl/PythonDispatcherTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/SizesAndStrides.cpp.o c10/CMakeFiles/c10.dir/core/impl/TorchDispatchModeTLS.cpp.o c10/CMakeFiles/c10.dir/core/impl/alloc_cpu.cpp.o c10/CMakeFiles/c10.dir/core/thread_pool.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUCachingAllocator.cpp.o c10/CMakeFiles/c10.dir/mobile/CPUProfilingAllocator.cpp.o c10/CMakeFiles/c10.dir/util/ApproximateClock.cpp.o c10/CMakeFiles/c10.dir/util/Backtrace.cpp.o c10/CMakeFiles/c10.dir/util/Bfloat16.cpp.o c10/CMakeFiles/c10.dir/util/C++17.cpp.o c10/CMakeFiles/c10.dir/util/DeadlockDetection.cpp.o c10/CMakeFiles/c10.dir/util/Exception.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fn.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e4m3fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2.cpp.o c10/CMakeFiles/c10.dir/util/Float8_e5m2fnuz.cpp.o c10/CMakeFiles/c10.dir/util/Half.cpp.o c10/CMakeFiles/c10.dir/util/LeftRight.cpp.o c10/CMakeFiles/c10.dir/util/Logging.cpp.o c10/CMakeFiles/c10.dir/util/MathConstants.cpp.o c10/CMakeFiles/c10.dir/util/Metaprogramming.cpp.o c10/CMakeFiles/c10.dir/util/Optional.cpp.o c10/CMakeFiles/c10.dir/util/ParallelGuard.cpp.o c10/CMakeFiles/c10.dir/util/SmallVector.cpp.o c10/CMakeFiles/c10.dir/util/StringUtil.cpp.o c10/CMakeFiles/c10.dir/util/ThreadLocalDebugInfo.cpp.o c10/CMakeFiles/c10.dir/util/TypeCast.cpp.o c10/CMakeFiles/c10.dir/util/TypeList.cpp.o c10/CMakeFiles/c10.dir/util/TypeTraits.cpp.o c10/CMakeFiles/c10.dir/util/Type_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Type_no_demangle.cpp.o c10/CMakeFiles/c10.dir/util/Unicode.cpp.o c10/CMakeFiles/c10.dir/util/UniqueVoidPtr.cpp.o c10/CMakeFiles/c10.dir/util/complex_math.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_gflags.cpp.o c10/CMakeFiles/c10.dir/util/flags_use_no_gflags.cpp.o c10/CMakeFiles/c10.dir/util/int128.cpp.o c10/CMakeFiles/c10.dir/util/intrusive_ptr.cpp.o c10/CMakeFiles/c10.dir/util/numa.cpp.o c10/CMakeFiles/c10.dir/util/signal_handler.cpp.o c10/CMakeFiles/c10.dir/util/tempfile.cpp.o c10/CMakeFiles/c10.dir/util/thread_name.cpp.o c10/CMakeFiles/c10.dir/util/typeid.cpp.o -Wl,-rpath,::::::: /usr/lib/aarch64-linux-gnu/libnuma.so lib/libcpuinfo.a -pthread && /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake -E __run_co_compile --lwyu=“ldd;-u;-r” --source=lib/libc10.so && :
/usr/bin/ld: error: linker script file ‘/home/workmin/repos/pytorch_2_4_1/cmake/linker_script.ld’ appears multiple times
collect2: error: ld returned 1 exit status
any clue?
Hi,
We have prebuilt packages but only with the default Python 3.8:
About the link error, could you try to build it with a single thread to see if it helps?
Thanks.
I know! but I’m using some TTS libraries supporting python 3.12 minimum!
Hi,
When checking the L4T-R35.4.1
branch, there is a configuration that might relate to your error.
Could you turn it off and try it again?
Thanks.
ok I’m going to try it after my last current try which is not finished… thanks