OK great, so glad that you got it working!
@dusty_nv Seems I canât reach drive google too. I think all ftp sites are blocked on our side.
I can access this site for tensorflow for example: https://developer.download.nvidia.com/compute/redist/jp/v45/tensorflow/
Unfortunately Iâm unable to host these unofficial wheels at places other than filesharing sites. The l4t-pytorch container is hosted on NGC registry which you may be able to resolve. Otherwise I recommend trying to download the wheels at home or from some other connection - sorry about that.
I build pytorch 1.7 on Xavier NX with Yocto OS successfully however when I try to import torch i got:
``
import torch
Traceback (most recent call last):
File ââ, line 1, in
File â/home/root/.local/lib/python3.8/site-packages/torch/init.pyâ, line 190, in
from torch.C import *
ImportError: /home/root/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: zgetrs
``
Can somebody help ?
Hi all, Iâve built the wheel for PyTorch 1.10, here it is:
PyTorch v1.10.0
- JetPack 4.4 (L4T R32.4.3) / JetPack 4.4.1 (L4T R32.4.4) / JetPack 4.5 (L4T R32.5.0) / JetPack 4.5.1 (L4T R32.5.1) / JetPack 4.6 (L4T R32.6.1)
- Python 3.6 -
torch-1.10.0-cp36-cp36m-linux_aarch64.whl
Hi dusty,
I want to run pytorch in develop mode on a Jetson TX2 (here) and as a first step, I am trying to install pytorch 1.4.0 from source following the steps given.
I am getting an error repeatedly at this step:
[1139/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorMathScan.cu.o
[1140/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorScatterGather.cu.o
[1141/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorSort.cu.o
[1142/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorTopK.cu.o
ninja: build stopped: subcommand failed.
-- Building version 1.4.0
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/wanderer/Desktop/workspace/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3/dist-packages -DNUMPY_INCLUDE_DIR=/usr/lib/python3/dist-packages/numpy/core/include -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYTHON_INCLUDE_DIR=/usr/include/python3.6m -DPYTHON_LIBRARY=/usr/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.4.0 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/wanderer/Desktop/workspace/pytorch
cmake --build . --target install --config Release -- -j 6
Traceback (most recent call last):
File "setup.py", line 755, in <module>
build_deps()
File "setup.py", line 316, in build_deps
cmake=cmake)
File "/home/wanderer/Desktop/workspace/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
cmake.build(my_env)
File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 335, in build
self.run(build_args, my_env)
File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.
System configuration:
CUDA runtime version: 10.2.300
OS: Ubuntu 18.04.5 LTS (aarch64)
GCC version: (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: version 3.10.2
Libc version: glibc-2.25
Python version: 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.9.253-tegra-aarch64-with-Ubuntu-18.04-bionic
I have followed all the steps required and applied the patch too. Any help on resolving this is highly appreciated.
Thanks a lot!
@devileash hmm I donât see what the actual error is from your log (only that there was some error)
Iâm also not very familiar with develop mode, so not sure what the issue is - sorry about that. Was there more detailed error messages available?
Hi @dusty_nv,
Thanks a lot for replying. There is another error message before it but I am not sure if it helps:
[1136/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorRandom.cu.o
[1137/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o
FAILED: caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o
cd /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC && /usr/bin/cmake -E make_directory /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o -D generated_cubin_file:STRING=/home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o.cubin.txt -P /home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorIndex.cu.o.Release.cmake
Killed
CMake Error at torch_generated_THCTensorIndex.cu.o.Release.cmake:281 (message):
Error generating file
/home/wanderer/Desktop/workspace/pytorch/build/caffe2/CMakeFiles/torch.dir/__/aten/src/THC/./torch_generated_THCTensorIndex.cu.o
[1138/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorMathPairwise.cu.o
[1139/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorMathScan.cu.o
[1140/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorScatterGather.cu.o
[1141/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorSort.cu.o
[1142/2833] Building NVCC (Device) object caffe2/CMakeFiles/torch.dir/__/aten/src/THC/torch_generated_THCTensorTopK.cu.o
ninja: build stopped: subcommand failed.
-- Building version 1.4.0
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/wanderer/Desktop/workspace/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3/dist-packages -DNUMPY_INCLUDE_DIR=/usr/lib/python3/dist-packages/numpy/core/include -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYTHON_INCLUDE_DIR=/usr/include/python3.6m -DPYTHON_LIBRARY=/usr/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.4.0 -DUSE_DISTRIBUTED=0 -DUSE_NCCL=0 -DUSE_NUMPY=True -DUSE_PYTORCH_QNNPACK=0 -DUSE_QNNPACK=0 /home/wanderer/Desktop/workspace/pytorch
cmake --build . --target install --config Release -- -j 6
Traceback (most recent call last):
File "setup.py", line 755, in <module>
build_deps()
File "setup.py", line 316, in build_deps
cmake=cmake)
File "/home/wanderer/Desktop/workspace/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
cmake.build(my_env)
File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 335, in build
self.run(build_args, my_env)
File "/home/wanderer/Desktop/workspace/pytorch/tools/setup_helpers/cmake.py", line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '6']' returned non-zero exit status 1.
Please let me know which files I can look for more errors if required.
Thank you!
OK, so Killed
message typically means that your board ran out of memory while compiling. Try mounting swap memory like this:
cannot install libopennmpi-dev
I have trouble with this issue while install
sudo apt-get install python3-pip libopenblas-base libopenmpi-dev
The following packages have unmet dependencies:
libibverbs-dev : Depends: libibverbs1 (= 17.1-1) but 17.1-1ubuntu0.2 is to be installed
libopenmpi-dev : Depends: libhwloc-dev but it is not going to be installed
Thanks a lot @dusty_nv, the error was resolved by adding a swap.
But I am still getting some errors due to cudnn and caffe2 so I flashed the Jetson TX2 and started the steps again and am again getting the same steps. Some of the errors are:
[2357/2833] Building CXX object caffe2/CMakeFiles/torch.dir/operators/conv_op_cache_cudnn.cc.o
[2358/2833] Building CXX object caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o
FAILED: caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o
/usr/bin/c++ -DAT_PARALLEL_OPENMP=1 -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_CUDA -D_FILE_OFFSET_BITS=64 -Dtorch_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.6m -isystem /usr/lib/python3/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /opt/rocm/hip/include -isystem /include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -I../caffe2/../torch/csrc/api -I../caffe2/../torch/csrc/api/include -I../caffe2/aten/src/TH -Icaffe2/aten/src/TH -I../caffe2/../torch/../aten/src -Icaffe2/aten/src -Icaffe2/../aten/src -Icaffe2/../aten/src/ATen -I../caffe2/../torch/csrc -I../caffe2/../torch/../third_party/miniz-2.0.8 -I../aten/src/TH -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -isystem include -Icaffe2/aten/src/THC -I../aten/src/THC -I../aten/src/THCUNN -I../aten/src/ATen/cuda -I../c10/.. -I../third_party/NNPACK/include -I../third_party/pthreadpool/include -I../third_party/cpuinfo/include -I../third_party/FP16/include -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow -O3 -fPIC -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -DCAFFE2_BUILD_MAIN_LIB -std=gnu++11 -MD -MT caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o -MF caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o.d -o caffe2/CMakeFiles/torch.dir/operators/conv_op_cudnn.cc.o -c ../caffe2/operators/conv_op_cudnn.cc
In file included from ../caffe2/core/context_gpu.h:20:0,
from ../caffe2/operators/conv_op_cudnn.cc:4:
../caffe2/operators/conv_op_cudnn.cc: In member function âbool caffe2::CudnnConvOp::DoRunWithType()â:
../caffe2/operators/conv_op_cudnn.cc:760:11: error: âCUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMITâ was not declared in this scope
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT,
....
../caffe2/operators/conv_op_cudnn.cc:754:21: error: there are no arguments to âcudnnGetConvolutionForwardAlgorithmâ that depend on a template parameter, so a declaration of âcudnnGetConvolutionForwardAlgorithmâ must be available [-fpermissive]
CUDNN_ENFORCE(cudnnGetConvolutionForwardAlgorithm(
....
../caffe2/operators/conv_op_cudnn.cc: In member function âbool caffe2::CudnnConvGradientOp::DoRunWithType()â:
../caffe2/operators/conv_op_cudnn.cc:1173:11: error: âCUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMITâ was not declared in this scope
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,
....
../caffe2/operators/conv_op_cudnn.cc:1167:21: error: there are no arguments to âcudnnGetConvolutionBackwardFilterAlgorithmâ that depend on a template parameter, so a declaration of âcudnnGetConvolutionBackwardFilterAlgorithmâ must be available [-fpermissive]
CUDNN_ENFORCE(cudnnGetConvolutionBackwardFilterAlgorithm(
....
../caffe2/operators/conv_op_cudnn.cc:860:16: required from here
../caffe2/operators/conv_op_cudnn.cc:754:56: error: âcudnnGetConvolutionForwardAlgorithmâ was not declared in this scope
CUDNN_ENFORCE(cudnnGetConvolutionForwardAlgorithm(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
cudnn_wrapper_.inline_cudnn_handle(),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bottom_desc_,
~~~~~~~~~~~~~
filter_desc_,
~~~~~~~~~~~~~
conv_desc_,
~~~~~~~~~~~
top_desc_,
~~~~~~~~~~
CUDNN_CONVOLUTION_FWD_SPECIFY_WORKSPACE_LIMIT,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cudnn_ws_nbytes_limit_,
~~~~~~~~~~~~~~~~~~~~~~~
&algo_));
~~~~~~~
../caffe2/core/common_cudnn.h:71:28: note: in definition of macro âCUDNN_ENFORCEâ
cudnnStatus_t status = condition;
...
../caffe2/operators/conv_op_cudnn.cc: In instantiation of âbool caffe2::CudnnConvGradientOp::DoRunWithType() [with T_X = float; T_DY = float; T_W = float; T_B = float; T_DX = float; T_DW = float; T_DB = float]â:
../caffe2/operators/conv_op_cudnn.cc:1440:16: required from here
../caffe2/operators/conv_op_cudnn.cc:1167:63: error: âcudnnGetConvolutionBackwardFilterAlgorithmâ was not declared in this scope
CUDNN_ENFORCE(cudnnGetConvolutionBackwardFilterAlgorithm(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
cudnn_wrapper_.inline_cudnn_handle(),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bottom_desc_,
~~~~~~~~~~~~~
top_desc_,
~~~~~~~~~~
bwd_filter_conv_desc_,
~~~~~~~~~~~~~~~~~~~~~~
filter_desc_,
~~~~~~~~~~~~~
CUDNN_CONVOLUTION_BWD_FILTER_SPECIFY_WORKSPACE_LIMIT,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cudnn_ws_nbytes_limit_,
~~~~~~~~~~~~~~~~~~~~~~~
&bwd_filter_algo_));
~~~~~~~~~~~~~~~~~~
../caffe2/core/common_cudnn.h:71:28: note: in definition of macro âCUDNN_ENFORCEâ
cudnnStatus_t status = condition; \
I ran the command python3 setup.py develop &> build.log
and got the above errors. On searching online it says that it is due to incompatibility issues between Caffe and cudnn. It says it can be resolved following the patch on Install Caffe on Jetson Nano - Q-engineering but the error is in Caffe2 here instead of Caffe. Also the caffe2 is being installed directly from pytorch source.
My system configuration is:
$ python3 jetsonInfo.py
NVIDIA Jetson TX2
L4T 32.4.3 [ JetPack 4.4 ]
Ubuntu 18.04.4 LTS
Kernel Version: 4.9.140-tegra
CUDA 10.2.89
CUDA Architecture: 6.2
OpenCV version: 4.1.1
OpenCV Cuda: NO
CUDNN: 8.0.0.180
TensorRT: 7.1.3.0
Vision Works: 1.6.0.501
VPI: 0.3.7
Vulcan: 1.2.70
Please suggest any solutions to this. I am trying to build the vanilla pytorch 1.4.0 version on Jetson TX2. Any help is highly appreciated.
Thank you in advance!
Hi @vision-hobbist1995, this seems kind of related to your other post here:
It seems that you are having trouble installing a variety of packages. Iâm not sure if you did an apt dist-upgrade or somehow the systemâs package manager got into an unworking state. My recommendation would be to backup your work and re-flash your SD card (or a different SD card)
Hi @devileash, Iâm not sure which version of JetPack you are on, but due to the cuDNN errors I believe that PyTorch 1.4 is incompatible (too old) for your version of JetPack/cuDNN. This is why there are only PyTorch wheels provided for JetPack 4.4 and newer that are for PyTorch >= 1.6 (because updates to PyTorch were required to support cuDNN8).
My recommendation would be to move to a newer version of PyTorch to maintain compatibility (or alternatively, use an older version of JetPack)
Need the same pythorch 1.8 with CUDA 10.2 wheel files for latest python versions (Python 3.8 or 3.9) . Thatâd be really helpful!
Hi @ihimu, please see my reply to your other topic here:
Or maybe someone else on this thread who built a Python 3.8 wheel can share it with you.
Hi ken, can you share the pytorch wheel for python3.7? Iâm struggling to build it from source that that is too slowâŠ
Thank you! The google drive link for Pytorch1.7 worked for me
If you are not specific to python 3.7 and can use python 3.8 then use the google drive link:
This is the same link that @dusky_nv had linked to in the above reply.
It installs pytorch 1.7.0 with cuda enabled on Jetson Nano.
Hi @dusty_nv,
Is there any chance you can release pip wheels built for cp-39-cp39m? Iâm developing with Jetpack-4.6 and -oe4t which is currently on the honister release of yocto. That releaseâs meta-python layer is at python-3.9. Yocto recently made a change to its recipe syntax that will make rolling back to an earlier version of python somewhat painful.
Alternatively I am considering using the docker containers here: [NVIDIA L4T PyTorch | NVIDIA NGC]
but I am uncertain of how to do a build time deployment of that? Perhaps I could install the container on the running target then scp it over to my host and add it as a source file to a bitbake recipe??? I donât know if that is feasible, but I may try that out if I canât get a python 3.9 based wheel from anywhere.
Thanks
-David
Thanks