Build the pytorch from source for drive agx xavier

Please provide the following info:
Hardware Platform: [Example: DRIVE AGX Xavier™ Developer Kit]
Software Version: [Example: DRIVE Software 10,]
Host Machine Version: [Example: native Ubuntu 18.04]
SDK Manager Version: [Example: 1.1.0.6343]

Hi,
When I try to install the pytorch from source, following the instuctions: PyTorch for Jetson Nano - version 1.5.0 now available

I got the following error:

running build_ext
– Building with NumPy bindings
– Not using cuDNN
– Not using MIOpen
– Detected CUDA at /usr/local/cuda
– Not using MKLDNN
– Not using NCCL
– Building without distributed package

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch/lib/python3/dist-packages/caffe2/python/caffe2_pybind11_state.cpython-36m-aarch64-linux-gnu.so to /home/nvidia/dl/PyTorch/pytorch/build/lib.linux-aarch64-3.6/caffe2/python/caffe2_pybind11_state.cpython-36m-aarch64-linux-gnu.so

Copying extension caffe2.python.caffe2_pybind11_state_gpu
torch/lib/python3/dist-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-aarch64-linux-gnu.so does not exist
building ‘torch._C’ extension
aarch64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-M4RXmS/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.6m -c torch/csrc/stub.cpp -o build/temp.linux-aarch64-3.6/torch/csrc/stub.o -std=c++11 -Wall -Wextra -Wno-strict-overflow -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-deprecated-declarations -fno-strict-aliasing -Wno-missing-braces
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
aarch64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -specs=/usr/share/dpkg/no-pie-link.specs -Wl,-z,relro -Wl,-Bsymbolic-functions -specs=/usr/share/dpkg/no-pie-link.specs -Wl,-z,relro -g -fdebug-prefix-map=/build/python3.6-M4RXmS/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-aarch64-3.6/torch/csrc/stub.o -L/home/nvidia/dl/PyTorch/pytorch/torch/lib -L/usr/local/cuda/lib64 -lshm -ltorch_python -o build/lib.linux-aarch64-3.6/torch/_C.cpython-36m-aarch64-linux-gnu.so -Wl,–no-as-needed /home/nvidia/dl/PyTorch/pytorch/torch/lib/libcaffe2_gpu.so -Wl,–as-needed -Wl,-rpath,$ORIGIN/lib
aarch64-linux-gnu-g++: error: /home/nvidia/dl/PyTorch/pytorch/torch/lib/libcaffe2_gpu.so: No such file or directory
error: command ‘aarch64-linux-gnu-g++’ failed with exit status 1

Hi @lijie2019,

You are referring to a post on the Jetson family, and the Drive products are different in targetting purposes and the software stack each has.

It seems like the version the whl file of 1.5.0 of PyTorch is depended on cudnn 8.
this version is not installed on the Drive with the current latest version: Drive Software 10.
Note: it is not recommended to overwrite manually what is delivered as a packaged flashed on the Drive AGX.

So I assume you didn’t use a whl file.

Please provide the following information:

  • what is your hardware platform?
  • what is the software version you are using?
  • what are you trying to achieve?
  • Did you apply a patch to the version once you’ve cloned it? what version did you cloned?
  • can you please describe the whole process you have executed?

thanks

  • what is your hardware platform?
    The device is the xavier A in the DRIVE AGX Xavier™ Developer Kit
  • what is the software version you are using?
    DRIVE Software 10
  • what are you trying to achieve?
    try to install the pytorch1.1.0 (which need to be compatible with CUDA 10.2 in Drive Software 10.0)
  • Did you apply a patch to the version once you’ve cloned it? what version did you cloned?
    I didn’t apply any patches, and the version branch is v1.1.0
  • can you please describe the whole process you have executed?
    I just followed the instructions for jetson but skipped the step of “apply patches”. PyTorch for Jetson Nano - version 1.5.0 now available

Hi @lijie2019,
please try cloning and building V1.5.0 tag of PyTorch, while following the instructions as you mentioned here PyTorch for Jetson Nano - version 1.5.0 now available, with only 2 addition:

  1. before following the instructions, executing the following command
    sudo dpkg -i /var/cuda-repo-10-2-local-10.2.19/cuda-toolkit-10-2_10.2.19-1_arm64.deb (this will output an error)
    sudo apt --fix-broken -y install
    (this will install all necessary include files, as the Drive platform is not meant for local development rather remote development, so include files are not needed)
  2. apply a patch to the version as follow:
    in file aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S (line 662)

replace the following lines:

MOV V8.4s, V9.4s
MOV v10.4s, v11.4s
MOV v12.4s, V13.4s
MOV V14.4s, V15.4s
MOV V16.4s, V17.4s
MOV V18.4s, V19.4s
MOV V20.4s, V21.4s
MOV V22.4s, V23.4s

with these:

MOV V8.16b, V9.16b
MOV v10.16b, v11.16b
MOV v12.16b, V13.16b
MOV V14.16b, V15.16b
MOV V16.16b, V17.16b
MOV V18.16b, V19.16b
MOV V20.16b, V21.16b
MOV V22.16b, V23.16b

This has compiled and the wheel was created on my side. also the “Verification” phase passed.

NOTE: please take into account that using PyTorch is not optimized for the Drive Platform, and using the TensorRT framework is the right way in order to best utilize the HW accelerators of the Drive AGX and by that get the best performance with DNN.

Please consider watching the following NVIDIA webinar for automotive customers:
Integrating DNN Inference into Autonomous Vehicle Applications with NVIDIA DriveWorks SDK

Thanks

Hi,
Thanks for your detailed reply.
Yesterday, when I used the aptitude to resolve the dependencies, I didn’t realize that many packages can be removed to make the software system corrupted. So I have to re-flash the Software-10.0 to the AGX device, and after the flash, I found that the pip3 was not installed, and “sudo apt-get install python3-pip” also involves dependency problems:

The following packages have unmet dependencies:
python3-pip : Depends: python-pip-whl (= 9.0.1-2) but 9.0.1-2.3~ubuntu1.18.04.1 is to be installed
Recommends: python3-dev (>= 3.2) but it is not going to be installed
Recommends: python3-wheel but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

The dependency problem can be resolved by the aptitude tool, but its solutions includes remove or downgrade, and I’m not sure whether it will affect the system completetion.

Hi @lijie2019,
please try executing sudo apt update and sudo apt upgrade and then try again to install pip3.

Thanks for your prompt reply. With the RTC setting correctly, the pip3 can be installed after updating the apt.