PyTorch + CUDA11.4 on 6.0.8.1

naoki.tamemoto · November 9, 2023, 7:56am

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello,

I would like to use PyTorch + CUDA11.4 on DRIVE Orin, so I tried to install it while referencing the thread below:

however, I met an error fatal error: nvml.h: No such file or directory and the compilation failed.

On DRIVE Orin “nvml.h” was not found, and in Docker container based on DRIVE OS 6.0.8.1, it was found only for architecture x86_64.

On DRIVE Orin (flashed by using Docker container based on DRIVE OS 6.0.8.1):

nvidia@tegra-ubuntu:~$ sudo find / -name "*nvml*"
nvidia@tegra-ubuntu:~$

In the container based on DRIVE OS 6.0.8.1:

root@6.0.8.1-0006-build-linux-sdk:/drive# find / -name "*nvml*"
/usr/include/hwloc/nvml.h
/usr/share/doc/cuda-nvml-dev-11-4
/usr/local/cuda-11.4/targets/x86_64-linux/include/nvml.h
/usr/local/cuda-11.4/nvml
/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
root@6.0.8.1-0006-build-linux-sdk:/drive#

But in Docker container based on DRIVE OS 6.0.6.0, “nvml.h” existed also under /usr/local/cuda-11.4/targets/aarch64-linux/include

root@6.0.6.0-0004-build-linux-sdk:/drive# find / -name "*nvml*"
/usr/include/hwloc/nvml.h
/usr/share/doc/cuda-nvml-dev-11-4
/usr/share/doc/cuda-nvml-cross-aarch64-11-4
/usr/local/cuda-11.4/targets/x86_64-linux/include/nvml.h
/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h
/usr/local/cuda-11.4/nvml
/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
/var/lib/dpkg/info/cuda-nvml-cross-aarch64-11-4.md5sums
/var/lib/dpkg/info/cuda-nvml-cross-aarch64-11-4.list
/drive/drive-linux/filesystem/targetfs/usr/share/doc/cuda-nvml-dev-11-4
/drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h
/drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/nvml
/drive/drive-linux/filesystem/targetfs/var/lib/dpkg/info/cuda-nvml-dev-11-4.list
/drive/drive-linux/filesystem/targetfs/var/lib/dpkg/info/cuda-nvml-dev-11-4.md5sums
root@6.0.6.0-0004-build-linux-sdk:/drive#

So I have two questions:

Why does not nvml for architecture aarch64 exist in DRIVE OS 6.0.8.1? Is this deprecated?
Could I copy nvml from the container based on DRIVE OS 6.0.6.0 (or some other libraries exist only in DRIVE OS 6.0.6.0 for aarch64) to DRIVE Orin or the container based on DRIVE OS 6.0.8.1 and use it? Or is there any way to use nvml in the container with DRIVE OS 6.0.8.1?

SivaRamaKrishnaNV · November 9, 2023, 8:09am

Dear @naoki.tamemoto,
Could you check copying the needed header file from 6.0.8.1 docker to target?

naoki.tamemoto · November 9, 2023, 8:30am

Hi @SivaRamaKrishnaNV

Sorry but I don’t understand. What should I check?

In 6.0.8.1 docker, nvml.h exist under /usr/local/cuda-11.4/targets/x86_64-linux/include/ but not under /usr/local/cuda-11.4/targets/aarch64-linux/include/, nor /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include.

I think the needed files (nvml.h and maybe libnvidia-ml.so for aarch64) do not exist even in 6.0.8.1 docker in the first place.

SivaRamaKrishnaNV · November 9, 2023, 11:54am

Dear @naoki.tamemoto,
I notice libnvidia-ml.so is missing in DRIVE OS 6.0.8.1. I am checking on this and get back to you.

Note that, we don’t officially support pyTorch on DRIVE. We recommend to use PyTorch ->ONNX(on host) → TensorRT(target) path for DL model deployment.

Could you check using nvml library from DRIVE OS 6.0.6 or Jetpack release(from Jetson AGX Orin) and copy x86 headers onto target to see if it unblocks. Let us know if there is any progress.

naoki.tamemoto · November 10, 2023, 2:43am

@SivaRamaKrishnaNV

Thank you for checking.

Could you check using nvml library from DRIVE OS 6.0.6 or Jetpack release(from Jetson AGX Orin) and copy x86 headers onto target to see if it unblocks.

I copy /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/include/nvml.h and /drive/drive-linux/filesystem/targetfs/usr/local/cuda-11.4/targets/aarch64-linux/lib/stubs/libnvidia-ml.so from 6.0.6 docker container to DRIVE Orin (flashed by 6.0.8 docker container) and they seemingly work fine (compiling succeed without block).

I added some options to a command used in the post below:
cuDNN & TensorRT on SDK 6.0.6 - #12 by servanti

and an actual command that I used was as below:

BUILD_TORCH=ON \
CMAKE_PREFIX_PATH="/usr/bin/" \
LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:/usr/local/lib:$LD_LIBRARY_PATH \
CUDA_BIN_PATH=/usr/local/cuda-11.4/bin \
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-11.4/ \
CUDNN_LIB_DIR=/usr/lib/aarch64-linux-gnu \
CUDNN_INCLUDE_DIR=/usr/include/aarch64-linux-gnu \
CUDNN_LIBRAY=/usr/lib/aarch64-linux-gnu/libcudnn.so \
CUDA_NVCC_EXECUTABLE=/usr/local/cuda-11.4/bin/nvcc \
CUDA_INCLUDE_DIRS=/usr/local/cuda-11.4/include \
CUDA_CUDART_LIBRARY=/usr/local/cuda-11.4/lib64/libcudart.so \
CUDA_CUDA_LIBRARY=/usr/local/cuda-11.4/lib64/stubs/libcuda.so  \
CUDA_HOST_COMPILER=cc \
USE_CUDA=1 \
USE_CUDNN=1 \
USE_NNPACK=1 \
TORCH_CXX_FLAGS=D_GLIBCXX_USE_CXX11_ABI=1 \
CC=cc \
CXX=c++ \
TORCH_CUDA_ARCH_LIST="8.7" \
TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_CUDA_COMPILER=/usr/local/cuda-11.4/bin/nvcc \
CMAKE_CUDA_ARCHITECTURES="87" \
python3 setup.py bdist_wheel

SivaRamaKrishnaNV · December 11, 2023, 7:31pm

Dear @naoki.tamemoto,
The L4T Jetson Orin instructions work for Drive Orin

Install system packages required by PyTorch:

sudo apt-get -y update; 
sudo apt-get -y install autoconf bc build-essential g++-8 gcc-8 clang-8 lld-8 gettext-base gfortran-8 iputils-ping libbz2-dev libc++-dev libcgal-dev libffi-dev libfreetype6-dev libhdf5-dev libjpeg-dev liblzma-dev libncurses5-dev libncursesw5-dev libpng-dev libreadline-dev libssl-dev libsqlite3-dev libxml2-dev libxslt-dev locales moreutils openssl python-openssl rsync scons python3-pip libopenblas-dev;

Export with the following command:export TORCH_INSTALL=https://developer.download.nvidia.cn/compute/redist/jp/v511/pytorch/torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whlOr, download the wheel file and set.export TORCH_INSTALL=path/to/torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
Install PyTorch.python3 -m pip install --upgrade pip; python3 -m pip install aiohttp numpy=='1.19.4' scipy=='1.5.3' export "LD_LIBRARY_PATH=/usr/lib/llvm-8/lib:$LD_LIBRARY_PATH"; python3 -m pip install --upgrade protobuf; python3 -m pip install --no-cache $TORCH_INSTALL

nvidia@tegra-ubuntu:~$ cat /etc/nvidia/version-ubuntu-rootfs.txt
6.0.8.1-34171226
nvidia@tegra-ubuntu:~$ python
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.0.0+nv23.05'
>>> torch.cuda.is_available()
True
>>>
```

naoki.tamemoto · December 15, 2023, 5:24am

@SivaRamaKrishnaNV

Thank you for your reply.

I will try this procedure later.

amin3672 · January 16, 2024, 4:12pm

SivaRamaKrishnaNV · January 16, 2024, 4:43pm

Dear @amin3672,
It worked?

amin3672 · January 17, 2024, 11:54pm

I am facing some issues with cuDNN while trying to install OpenCV with cuda in Drive Orin. Additionally I can’t install pytorch in the system for CUDA. Is there any installation procedure for OpenCV installation using CUDA and CuDNN. Is there any pytorch for GPU installation procedure? can you share the documentation.

SivaRamaKrishnaNV · January 24, 2024, 8:12pm

Dear @amin3672,
Please file a new topic for your issue.

system · February 7, 2024, 8:13pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Upgrading CUDA for Autoware Compatibility and tensorrt libs not Accessible Inside the l4t-jetpack DRIVE AGX Orin General driveos-cuda	10	761	January 22, 2024
cuDNN & TensorRT on SDK 6.0.6 DRIVE AGX Orin General driveos-dl	10	1314	September 21, 2023
Incompatible torch2.2+Cuda12.2 wheel with other python libraries for AGX Orin Jetpack6.0 Jetson AGX Orin cuda , pytorch	9	907	May 17, 2024
torch.cuda.is_available()=FALSE and [INFO]: Driver not installed... but it IS installed!? Jetson Orin NX cuda , cudnn	6	66	November 7, 2024
CUDA runtime on Jetson Orin AGX Jetson AGX Orin cuda	46	5405	September 1, 2023
CUDA 12 : Insufficient driver version on AGX Orin Jetson AGX Orin cuda , nvbugs	13	3684	March 23, 2023
Which nvidia packages are needed to run pytorch? Jetson Xavier NX pytorch	5	1290	July 5, 2023
How to install sdk developer kit on jetson agx orin arm64 architecture Jetson AGX Orin sdkm	33	1236	December 9, 2023
Pytorch compatibility issues (torch 2.0.0+nv23.5 && torchvision 0.15.1) Jetson Orin NX pytorch	10	14465	June 13, 2023
Libcurand.so.10 not found on JetPack 4.6.2 in docker Jetson AGX Xavier cuda	13	1926	July 6, 2022

PyTorch + CUDA11.4 on 6.0.8.1

Related topics