Upgrading CUDA for Autoware Compatibility and tensorrt libs not Accessible Inside the l4t-jetpack

gautam.kumar.jain1 · January 17, 2024, 3:50pm

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
[*] DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[*] Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
[*] DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
[*] other

Host Machine Version
[*] native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

I am trying to build a docker container on the nvidia drive agx orin using a multistage built method where I first use
nvcr.io/nvidia/l4t-jetpack: as the base image. In the second stage I am using arm64v8/ros:humble as based image to leveraging Cuda/cudnn/Tensorrt and ROS to build a specific autoware environment to run it on the drive agx orin. (PS: i am copying all the necessary libs and recreating the symlinks so that they are available in the next stage)

Currently I am facing some issues during the build the process :

1. Initially I have tested all the existing tags of the jetpack like:

nvcr.io/nvidia/l4t-jetpack:r36.2.0 → cuda 12.2
nvcr.io/nvidia/l4t-jetpack:r35.4.1 → cuda 11.4
nvcr.io/nvidia/l4t-jetpack:r35.3.1 → cuda 11.4
nvcr.io/nvidia/l4t-jetpack:r35.2.1 → cuda 11.4
nvcr.io/nvidia/l4t-jetpack:r35.1.0 → cuda 11.4

and for testing initially with just stage 1, I ran a simple cuda code, inside the container r36 was not compatible r35 lead to some JIT compiler errors but in all the other cases (35.3, 35.2, 35.1) I was able to access the GPU from inside the container. Cudnn tests also ran fine. But I am facing some issues related to Tensorrt which is itself failing my whole build while building specific autoware environment because some packages need tensorrt.

Just by building standalone this for testing versions of jetpack

by running the docker containers like:

docker run -it --gpus all --runtime nvidia nvcr.io/nvidia/l4t-jetpack:<tag_version> /bin/bash
I ran
find / -name libnvdla_compiler.so
but it was not to be found in (r35.3.1, r35.2.1, 35.1.0).

I tried to mount only this specific libraries those were missing inside the container like this,

docker run -it -v /usr/lib/libnvdla_compiler.so:/usr/lib/libnvdla_compiler.so --gpus all --runtime nvidia nvcr.io/nvidia/l4t-jetpack:r35.1.0 /bin/bash

I read that those low level libraries are generally flashed via SDK manager. Soon i encountered more issues related to more missing libs like: (libnvmedia.so, libnvmedia_tensor.so, libnvmedia_dla.so, etc.)

Why in the first place they are not accessible from the host system inside the container.

2. I require assistance in updating CUDA from version 11.4 to 12.2 on my system, to ensure compatibility with the latest version of Autoware. Could you please provide guidance on the upgrade process?

VickNV · January 17, 2024, 4:46pm

JetPack is for Jetson platforms, and there are specific considerations when running Docker containers on DRIVE AGX Orin. Please refer to Running Docker Containers Directly on NVIDIA DRIVE AGX Orin | NVIDIA Technical Blog for initial guidance on running Docker on DRIVE AGX Orin.

Regarding the Upgrade of CUDA from version 11.4 to 12.2, it is important to note that upgrading the CUDA version on DRIVE AGX Orin is not supported. The current version in the latest release, 6.0.8.1, is CUDA 11.4. Upgrading CUDA beyond the supported version may lead to compatibility issues.

gautam.kumar.jain1 · January 18, 2024, 11:40am

So, in that case If i use L4T-tensorrt and install a compatible cudnn. Then i dont need to run my application inside a docker container like mentioned in this post https://developer.nvidia.com/blog/running-docker-containers-directly-on-nvidia-drive-agx-orin/. I guess L4T (Linux for tegara) containers are supported on Nvidia Drive Agx Orin.

I tried running some different versions of l4t-tensorrt and installing cudnn manually. I am able to access GPU inside the docker container, cudnn also works but somehow there is an issue with Tensorrt. The docker container can not find some BSP libs that is needed by tensorrt. Is there any work around for that condition.

gautam.kumar.jain1 · January 18, 2024, 12:54pm

Apart from that I tried to compile some simple tensorrt code inside the Nvidia drive AGX orin (not inside the docker container). It is not working through the following commands, i checked tensorrt is installed after flashing the drive

nvidia@tegra-ubuntu:~$ sudo find / -name "libnvinfer*" ! -path "/mnt/external-ssd/*"
[sudo] password for nvidia: 
/usr/share/doc/libnvinfer-plugin8
/usr/share/doc/libnvinfer-bin
/usr/share/doc/libnvinfer8
/usr/lib/aarch64-linux-gnu/libnvinfer_builder_resource.so.8.5.10
/usr/lib/aarch64-linux-gnu/libnvinfer.so.8
/usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.10
/usr/lib/aarch64-linux-gnu/libnvinfer.so.8.5.10
/usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8
/var/lib/dpkg/info/libnvinfer-plugin8.md5sums
/var/lib/dpkg/info/libnvinfer8.list
/var/lib/dpkg/info/libnvinfer-plugin8.list
/var/lib/dpkg/info/libnvinfer8.md5sums
/var/lib/dpkg/info/libnvinfer-bin.md5sums
/var/lib/dpkg/info/libnvinfer8.shlibs
/var/lib/dpkg/info/libnvinfer-plugin8.shlibs
/var/lib/dpkg/info/libnvinfer8.triggers
/var/lib/dpkg/info/libnvinfer-bin.list
/var/lib/dpkg/info/libnvinfer-plugin8.triggers

dpkg -l | grep TensorRT
ii  libnvinfer-bin                       8.5.10-1+cuda11.4                       arm64        TensorRT binaries
ii  libnvinfer-plugin8                   8.5.10-1+cuda11.4                       arm64        TensorRT plugin libraries
ii  libnvinfer8                          8.5.10-1+cuda11.4                       arm64        TensorRT runtime libraries
ii  libnvonnxparsers8                    8.5.10-1+cuda11.4                       arm64        TensorRT ONNX libraries
ii  libnvparsers8                        8.5.10-1+cuda11.4                       arm64        TensorRT parsers libraries

and the error while compiling the code:

 nvcc -arch=sm_87 test1_tensorrt.cpp -o tensorrt_test -lnvinfer
test1_tensorrt.cpp:1:10: fatal error: NvInfer.h: No such file or directory
    1 | #include <NvInfer.h>
      |          ^~~~~~~~~~~
compilation terminated.

I tried to look for this particular header file but it does not exist.

nvidia@tegra-ubuntu:~$ sudo find / -name "NvInfer.h" ! -path "/mnt/external-ssd/*"
find: ‘/proc/2439332’: No such file or directory
find: ‘/proc/2439333’: No such file or directory
find: ‘/proc/2439337’: No such file or directory

and it tried to even set the variable paths:

nvidia@tegra-ubuntu:~$ export LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$LIBRARY_PATH
-linux-gnu:$LD_LIBRARY_PATH
nvidia@tegra-ubuntu:~$ export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu:$LD_LIBRARY_PATH

But the issue persists.

SivaRamaKrishnaNV · January 18, 2024, 1:07pm

Header files are not available on target. Please copy them from host/docker environment.

gautam.kumar.jain1 · January 18, 2024, 1:37pm

can you please elaborate or link to a source. I am confused which files to copy and paste at which location in the root fs inside the drive

SivaRamaKrishnaNV · January 18, 2024, 1:45pm

Dear @gautam.kumar.jain1,
Please check /usr/include/aarch64-linux-gnu/ on host(if sdkmanager is used to flash) or on docker(DRIVE OS 6.0.6 container) and copy to /usr/include/aarch64-linux-gnu on target

gautam.kumar.jain1 · January 18, 2024, 5:22pm

Dear @SivaRamaKrishnaNV,

I am trying to find a workaround so that I can solve the lib realted issues and I can run tensorrt inside a container on Nvidia Drive Agx Orin using l4t-tensorrt as base image rather than jetapack. I try to mount the missing libs specifically which are required by tensorrt while compiling.

Like ::

docker run -it -v /usr/lib/libnvdla_compiler.so:/usr/lib/libnvdla_compiler.so --gpus all --runtime nvidia nvcr.io/nvidia/l4t-jetpack:r35.1.0 /bin/bash

When i compile the code i got failure because of some missing libs, I tried searching them on Nvidia Drive Agx Orin

find / -name libnvmedia.so
find / -name libnvmedia_tensor.so
find / -name libnvmedia_dla.so

They are not available but I was able to find these libs on the host system from where I flashed the drive via SDK under path like:

 ttz_ad@TTZ-ad  ~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_ORIN_TARGETS/Linux_for_Tegra/rootfs/usr/lib/aarch64-linux-gnu/tegra  ls | grep libnvmedia
libnvmedia_2d.so
libnvmedia2d.so
libnvmedia_dla.so
libnvmedia_eglstream.so
libnvmedia_ide_parser.so
libnvmedia_ide_sci.so
libnvmedia_iep_sci.so
libnvmedia_ijpd_sci.so
libnvmedia_ijpe_sci.so
libnvmedia_iofa_sci.so
libnvmedia_isp_ext.so
libnvmedialdc.so
libnvmedia_sci_overlay.so
libnvmedia.so
libnvmedia_tensor.so

Can I copy these libs also and mount them while running the docker containers. Will that work to make use of l4t-tensorrt on Nvidia Drive Agx Orin.? Because only tensorrt is a issue here cuda is working I am able access the gpu inside the container and cudnn also works fine

SivaRamaKrishnaNV · January 18, 2024, 9:36pm

Dear @gautam.kumar.jain1,
You can try copying them and see if it works.

Regarding CUDA 12.x, as Vick clarified, it is not possible. Note that CUDA 12.x require nvidia driver >=525. 60.13(CUDA Compatibility :: NVIDIA Data Center GPU Driver Documentation). But DRIVE OS comes with 470.x.

gautam.kumar.jain1 · January 22, 2024, 11:23am

@SivaRamaKrishnaNV

Yes copying all the libnvmedia_*.so files and mounting them while running the docker container I am able to access GPU and compile the cudnn, cuda and tensorrt code from inside the container. Thank you for your guidance.

system · February 7, 2024, 7:53pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PyTorch + CUDA11.4 on 6.0.8.1 DRIVE AGX Orin General driveos-cuda	11	1811	February 7, 2024
Docker on top of AGX orin hardware with 6.0.6 DRIVE AGX Orin General docker	21	1126	June 28, 2023
Running Docker Containers Directly on NVIDIA DRIVE AGX Orin Technical Blog	17	1679	April 25, 2023
Problem creating my devel container in jetson AGX container Jetson AGX Xavier containers	18	3371	December 6, 2021
Unable to access CUDA repo from CUDA L4t docker container Jetson Xavier NX cuda	39	3541	November 13, 2022
Docker image for Jetson AGX Orin with CUDA environment Jetson AGX Orin cuda , docker , containers	5	1519	June 6, 2024
Errors while running Drive samples DRIVE AGX Orin General driveos-cuda	9	790	July 8, 2023
CUDA 12 : Insufficient driver version on AGX Orin Jetson AGX Orin cuda , nvbugs	13	3684	March 23, 2023
Unable to create a docker to run CUDA on JETSON AGX ORIN for opencv Jetson AGX Orin cuda , docker	3	184	August 1, 2024
Unable to to install Nvidia Driver on Drive AGX Orin DRIVE AGX Orin General driveos-cuda	11	981	November 29, 2023

Upgrading CUDA for Autoware Compatibility and tensorrt libs not Accessible Inside the l4t-jetpack

Related topics