PyTorch + CUDA11.4 on 6.0.8.1

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello,

I would like to use PyTorch + CUDA11.4 on DRIVE Orin, so I tried to install it while referencing the thread below:

however, I met an error fatal error: nvml.h: No such file or directory and the compilation failed.

On DRIVE Orin “nvml.h” was not found, and in Docker container based on DRIVE OS 6.0.8.1, it was found only for architecture x86_64.

On DRIVE Orin (flashed by using Docker container based on DRIVE OS 6.0.8.1):

nvidia@tegra-ubuntu:~$ sudo find / -name "*nvml*"
nvidia@tegra-ubuntu:~$ 

In the container based on DRIVE OS 6.0.8.1:

root@6.0.8.1-0006-build-linux-sdk:/drive# find / -name "*nvml*"
/usr/include/hwloc/nvml.h
/usr/share/doc/cuda-nvml-dev-11-4
/usr/local/cuda-11.4/targets/x86_64-linux/include/nvml.h
/usr/local/cuda-11.4/nvml
/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
root@6.0.8.1-0006-build-linux-sdk:/drive# 

But in Docker container based on DRIVE OS 6.0.6.0, “nvml.h” existed also under /usr/local/cuda-11.4/targets/aarch64-linux/include

root@6.0.6.0-0004-build-linux-sdk:/drive# find / -name "*nvml*"
/usr/include/hwloc/nvml.h
/usr/share/doc/cuda-nvml-dev-11-4
/usr/share/doc/cuda-nvml-cross-aarch64-11-4
/usr/local/cuda-11.4/targets/x86_64-linux/include/nvml.h
/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h
/usr/local/cuda-11.4/nvml
/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
/var/lib/dpkg/info/cuda-nvml-cross-aarch64-11-4.md5sums
/var/lib/dpkg/info/cuda-nvml-cross-aarch64-11-4.list
/drive/drive-linux/filesystem/targetfs/usr/share/doc/cuda-nvml-dev-11-4
/drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h
/drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/nvml
/drive/drive-linux/filesystem/targetfs/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/drive/drive-linux/filesystem/targetfs/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
root@6.0.6.0-0004-build-linux-sdk:/drive# 

So I have two questions:

  1. Why does not nvml for architecture aarch64 exist in DRIVE OS 6.0.8.1? Is this deprecated?
  2. Could I copy nvml from the container based on DRIVE OS 6.0.6.0 (or some other libraries exist only in DRIVE OS 6.0.6.0 for aarch64) to DRIVE Orin or the container based on DRIVE OS 6.0.8.1 and use it? Or is there any way to use nvml in the container with DRIVE OS 6.0.8.1?

Dear @naoki.tamemoto,
Could you check copying the needed header file from 6.0.8.1 docker to target?

Hi @SivaRamaKrishnaNV

Sorry but I don’t understand. What should I check?

In 6.0.8.1 docker, nvml.h exist under /usr/local/cuda-11.4/targets/x86_64-linux/include/ but not under /usr/local/cuda-11.4/targets/aarch64-linux/include/, nor /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include.

I think the needed files (nvml.h and maybe libnvidia-ml.so for aarch64) do not exist even in 6.0.8.1 docker in the first place.

Dear @naoki.tamemoto,
I notice libnvidia-ml.so is missing in DRIVE OS 6.0.8.1. I am checking on this and get back to you.

Note that, we don’t officially support pyTorch on DRIVE. We recommend to use PyTorch ->ONNX(on host) → TensorRT(target) path for DL model deployment.

Could you check using nvml library from DRIVE OS 6.0.6 or Jetpack release(from Jetson AGX Orin) and copy x86 headers onto target to see if it unblocks. Let us know if there is any progress.

@SivaRamaKrishnaNV

Thank you for checking.

Could you check using nvml library from DRIVE OS 6.0.6 or Jetpack release(from Jetson AGX Orin) and copy x86 headers onto target to see if it unblocks.

I copy /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h and /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so from 6.0.6 docker container to DRIVE Orin (flashed by 6.0.8 docker container) and they seemingly work fine (compiling succeed without block).

I added some options to a command used in the post below:
cuDNN & TensorRT on SDK 6.0.6 - #12 by servanti

and an actual command that I used was as below:

BUILD_TORCH=ON \
CMAKE_PREFIX_PATH="/usr/bin/" \
LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:/usr/local/lib:$LD_LIBRARY_PATH \
CUDA_BIN_PATH=/usr/local/cuda-11.4/bin \
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4/ \
CUDNN_LIB_DIR=/usr/lib/aarch64-linux-gnu \
CUDNN_INCLUDE_DIR=/usr/include/aarch64-linux-gnu \
CUDNN_LIBRAY=/usr/lib/aarch64-linux-gnu/libcudnn.so \
CUDA_NVCC_EXECUTABLE=/usr/local/cuda-11.4/bin/nvcc \
CUDA_INCLUDE_DIRS=/usr/local/cuda-11.4/include \
CUDA_CUDART_LIBRARY=/usr/local/cuda-11.4/lib64/libcudart.so \
CUDA_CUDA_LIBRARY=/usr/local/cuda-11.4/lib64/stubs/libcuda.so  \
CUDA_HOST_COMPILER=cc \
USE_CUDA=1 \
USE_CUDNN=1 \
USE_NNPACK=1 \
TORCH_CXX_FLAGS=D_GLIBCXX_USE_CXX11_ABI=1 \
CC=cc \
CXX=c++ \
TORCH_CUDA_ARCH_LIST="8.7" \
TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_CUDA_COMPILER=/usr/local/cuda-11.4/bin/nvcc \
CMAKE_CUDA_ARCHITECTURES="87" \
python3 setup.py bdist_wheel
1 Like

Dear @naoki.tamemoto,
The L4T Jetson Orin instructions work for Drive Orin

  1. Install system packages required by PyTorch:
sudo apt-get -y update; 
sudo apt-get -y install autoconf bc build-essential g++-8 gcc-8 clang-8 lld-8 gettext-base gfortran-8 iputils-ping libbz2-dev libc++-dev libcgal-dev libffi-dev libfreetype6-dev libhdf5-dev libjpeg-dev liblzma-dev libncurses5-dev libncursesw5-dev libpng-dev libreadline-dev libssl-dev libsqlite3-dev libxml2-dev libxslt-dev locales moreutils openssl python-openssl rsync scons python3-pip libopenblas-dev;
  1. Export with the following command:export TORCH_INSTALL=https://developer.download.nvidia.cn/compute/redist/jp/v511/pytorch/torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whlOr, download the wheel file and set.export TORCH_INSTALL=path/to/torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
  2. Install PyTorch.python3 -m pip install --upgrade pip; python3 -m pip install aiohttp numpy=='1.19.4' scipy=='1.5.3' export "LD_LIBRARY_PATH=/usr/lib/llvm-8/lib:$LD_LIBRARY_PATH"; python3 -m pip install --upgrade protobuf; python3 -m pip install --no-cache $TORCH_INSTALL
nvidia@tegra-ubuntu:~$ cat /etc/nvidia/version-ubuntu-rootfs.txt
6.0.8.1-34171226
nvidia@tegra-ubuntu:~$ python
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.0.0+nv23.05'
>>> torch.cuda.is_available()
True
>>>
```
1 Like

@SivaRamaKrishnaNV

Thank you for your reply.

I will try this procedure later.

Dear @amin3672,
It worked?

I am facing some issues with cuDNN while trying to install OpenCV with cuda in Drive Orin. Additionally I can’t install pytorch in the system for CUDA. Is there any installation procedure for OpenCV installation using CUDA and CuDNN. Is there any pytorch for GPU installation procedure? can you share the documentation.

Dear @amin3672,
Please file a new topic for your issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.