DRIVE AGX: Fail to run sampleOnnxMNIST with TensorRT C++ API

anton.nesterenko · May 24, 2021, 10:35am

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.5.0.7774
other

Host Machine Version
native Ubuntu 18.04
other

Dear support team,

I’ve upload release TensorRT-7.2.3.4 (Linux x86) including /samples to target platform DRIVE AGX
Set necessary dependencies(CUDA v11.1) for compiling /samples.
Nvidia example :
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp
was compiled successfully for aarch64 architecture.

After running binary file sample_onnx_mnist, I’ve got :

Thread 1 “sample_onnx_mni” received signal SIGSEGV, Segmentation fault.
0x0000007fb3ac2278 in vtable for __cxxabiv1::__si_class_type_info ()
from /usr/lib/aarch64-linux-gnu/libstdc++.so.6

Following my investigation it happens here SampleOnnxMNIST::infer() when we are trying to initialize Buffers for created Engine.
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp#L199

The similar error I’ve got for my local project, which works correctly on the host PC.
In Release Notes for DRIVE OS Linux 5.2.0.0., I’ve found that flashed version has TensorRT: 6.3.1.3

Could you please help me to understand a possibility of using TensorRT C++ API in DRIVE AGX platform:

Looks like due to HW differences between Host PC Ubuntu with GTX1070 and DRIVE AGX, I can’t allocate buffers from GPU for inference.
May I use the TensorRT C++ API in DRIVE AGX in the same way as on a host PC?
I’ve found the API’s guide for DRIVE AGX platform.
I have to use only mentioned C++ API ?
How user data passes to lower layer HW GPU SoC ? Special libs, driver etc… ?
May I use DRIVE AGX environment for Yolov
+ONNX inference tasks or should it be another
platform such as Jetson AGX Xavier ?

I’ve attached the full fail log and environment information in DriveAGXlogs.tar.gz
DriveAGXlogs.tar.gz (1.4 KB)

I appreciate your help with the mentioned issue.

SivaRamaKrishnaNV · May 24, 2021, 10:53am

Dear @anton.nesterenko,
Firstly, you are trying to use TRT 7.1 compiled binaries on DRIVE AGX which is not supported as DRIVE OS 5.2.0 supports TRT 6.3.
When you flash the target using sdkmanager TRT 6.3 gets installed on host. You can cross compile TRT 6.3 samples(Add a new sample similar to TRT sample) and use them on target.

1.Looks like due to HW differences between Host PC Ubuntu with GTX1070 and DRIVE AGX, I can’t allocate buffers from GPU for inference.
May I use the TensorRT C++ API in DRIVE AGX in the same way as on a host PC?

Yes.

I’ve found the API’s guide for DRIVE AGX platform.I have to use only mentioned C++ API ?

Yes. There are API changes across different TRT version. Please use the supported TRT version with DRIVE release

3.How user data passes to lower layer HW GPU SoC ? Special libs, driver etc… ?

TRT APIs will internally make use of CUDA API calls to make use of GPU.

May I use DRIVE AGX environment for Yolov+ONNX inference tasks

You can use DRIVE AGX platform to perform inference. Make sure you generate TRT model for target platform successfully. You can use trtexec or TRT APIs to generate TRT engine for target.

anton.nesterenko · May 24, 2021, 12:25pm

Dear SivaRamaKrishnaNV,

Thank you for the prompt extended reply.

I’ve tried to follow your recommendations.
All actions were done directly on target platform:

Downloaded TRT package v6.0.1.8
https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/6.0/GA_6.0.1.8/tars/TensorRT-6.0.1.8.Ubuntu-18.04.x86_64-gnu.cuda-10.2.cudnn7.6.tar.gz
I didn’t find TRT 6.3 as you mentioned
Untar archive and set compilation links to CUDA
sampleOnnxMNIST was compiled on target platform.

After launching this version I’ve got error on ONNX parsing stage:

ERROR: ModelImporter.cpp:463 In function importModel:
[4] Assertion failed: !_importer_ctx.network()->hasImplicitBatchDimension() && “This version of the ONNX parser only supports TensorRT INetworkDefinitions with an explicit batch dimension. Please ensure the network was created using the EXPLICIT_BATCH NetworkDefinitionCreationFlag.”
&&&& FAILED TensorRT.sample_onnx_mnist # ./sample_onnx_mnist

Target platform:
-rw-r–r-- 1 root root 2375856 May 24 14:47 libnvonnxparser.so.6
-rw-r–r-- 1 root root 2375856 May 24 14:47 libnvonnxparser.so.6.3.1
lrwxrwxrwx 1 root root 24 May 24 14:50 libnvonnxparser.so → libnvonnxparser.so.6.3.1

Could you please correct my actions or provide me appropriate link to package TRT 6.3

Thank you,
FailTensorRT-6-01-08_sampleOnnxMNIST.log (983 Bytes)

SivaRamaKrishnaNV · May 24, 2021, 1:35pm

Dear @anton.nesterenko ,
Note that, TRT releases for DRIVE platform are part of DRIVE release in-built and not available in developer.nvidia.com seperately. Could you check /usr/src/tensorrt/samples/ folder on host and confirm if you have sampleOnnxMNIST sample?

anton.nesterenko · May 24, 2021, 2:53pm

@SivaRamaKrishnaNV,

That is the problem, from the very beginning I can’t find any similar folders in my DRIVE AGX OS:

nvidia@tegra-ubuntu:$ ls -ltr /usr/local/tensorrt/
ls: cannot access ‘/usr/local/tensorrt/’: No such file or directory
nvidia@tegra-ubuntu:$ls -ltr /usr/local/
bin/ cuda-10.2/ driveupdate/ games/ lib/ man/ share/
cuda/ cuda-11.1/ etc/ include/ libexec/ sbin/ src/

Also, I didn’t find any mentions in NVIDIA DRIVE OS 5.2 LINUX Release notes
However, I found for Jetson Xavier NX :
Jetpack component Sample locations on reference filesystem:
TensorRT/usr/src/tensorrt/samples/
That is why I asked about using appropriate target board.

SivaRamaKrishnaNV · May 24, 2021, 3:17pm

Dear @anton.nesterenko,
Please check /use/src/tensort/samples/ folder on host and not on target. On target, we have removed all samples and header files to save space. We expect to do cross compile sample on host and run on target.

anton.nesterenko · May 24, 2021, 4:13pm

Dear @SivaRamaKrishnaNV ,

Sorry, I didn’t catch your request.
So, let me clarify my issues and actions regarding cross compile :

Host PC: Ubuntu 20.10 64bits, x86_64

Initially, I’ve downloaded TensorRT-7.2.3.4.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.1.tar.gz
Following recommendations:
Installation Guide :: NVIDIA Deep Learning TensorRT Documentation
all components have been installed.
Compilation of /samples folder for x86_64 finished successfully and each sample can be launched.
For building /samples for ARM (target board), I used the next:
Sample Support Guide :: NVIDIA Deep Learning TensorRT Documentation
make TARGET=aarch64 VERBOSE=TRUE
During aarch64 compilation I got next errors:

Errors with CUDA libraries
Nvidia doesn’t provide arm64-sbsa CUDA libs for Ubuntu 20.10
I copied necessary aarch64 CUDA libs from target platform to host.
This step was fixed.
Eventually I stuck with :
libnvinfer.so
libnvparsers.so
libnvinfer_plugin.so
libnvonnxparser.so
libmyelin.so
These libraries were skipped by the compiler due to incompatible type.
Looks like I have to use libs version for aarch64 instead of x86_64 , but I can’t find them or build it locally.
Please see cross compile log CrossCompileErrors.txt.
CrossCompileErrors.txt (6.9 KB)

After that I decided to compile and run code directly on target platform.

“/use/src/tensort/samples/” - no folder exists in host PC as well.

I agree with you that a cross compile solution would be more convenient.
Could you please advise me how to solve described issues with mentioned libs ?

SivaRamaKrishnaNV · May 25, 2021, 3:06am

Dear @anton.nesterenko,
Could you confirm if you have used the same host to flash the target using sdkmanager? In that case, I expect to have TRT samples installed on host.

anton.nesterenko · May 25, 2021, 4:58am

Dear @SivaRamaKrishnaNV ,

Unfortunately, I didn’t use my desktop for flashing target. Nvidia doesn’t provide sdkmanager package for Ubuntu 20.10.
So, target platform has been flashed from another host with Ubuntu 18.05.
From my point of view, cross compile compilation of TRT resources should be performed from in any host PC.
May be I’m wrong ?

SivaRamaKrishnaNV · May 25, 2021, 5:02am

Dear @anton.nesterenko,
Now the issue is clear.

From my point of view, cross compile compilation of TRT resources should be performed from in any host PC.
May be I’m wrong ?

But How do you get TRT 6.3.1 on new host? As I said, you need to use matched TRT version for cross compilation. This TRT version is only part of DRIVE release. You can download sdkmanager and check installing DRIVE OS on host ans select flashing(This sets up cross compilation and aarch64 libs on host) and ignore flashing at the end.

anton.nesterenko · May 25, 2021, 5:25am

Dear @SivaRamaKrishnaNV ,

I used next setup:

Host Ubuntu 18.05 flashed DRIVE OS 5.0 → target AGX
Host Ubuntu 20.10. Current host where I’m trying to cross compile TRT.

>>But How do you get TRT 6.3.1 on new host?
I can’t get TRT 6.3.1 for cross compilation, cause it’s a part of DRIVE OS

>>You can download sdkmanager and check installing DRIVE OS on host ans select flashing(This sets up cross compilation >>and aarch64 libs on host) and ignore flashing at the end.
It’s impossible to do from host with Ubuntu 20.10.
After launching sdkmanager, SDK’s window said:
TARGET OPERATING SYSTEM: Linux No available releases for ubuntu 2010
I can’t move to STEP 2 and start flashing
It was a reason for using Ubuntu 18.05.

SivaRamaKrishnaNV · May 25, 2021, 5:33am

Dear @anton.nesterenko ,
You need to use ubuntu 18.04 to get DRIVE OS 5.2.0 release. There is no work around for it.

anton.nesterenko · May 25, 2021, 5:46am

Dear @SivaRamaKrishnaNV ,

If I copy, TRT data from host with Ubuntu 18.05 (host where was performed flashing), to my current host, cross compilation would be worked ?

SivaRamaKrishnaNV · May 25, 2021, 6:00am

Dear @anton.nesterenko ,
If I copy, TRT data from host with Ubuntu 18.05 (host where was performed flashing), to my current host, cross compilation would be worked ?

You need to have matching CUDA/CUDNN versions as well. Note that ubuntu 18.04 and 20.04 will have different libs. We have not officially verified this combination.

anton.nesterenko · May 25, 2021, 8:29am

Dear @SivaRamaKrishnaNV ,

Thank you very much for great and fast support !
I’ll follow your directions, I guess we can close this topic.

Topic		Replies	Views
Running TensorRT inference in docker container on Drive Orin AGX DRIVE AGX Orin General driveos-dl	18	1758	September 12, 2023
DRIVE AGX: Onnx parsing error. Not supported operation DRIVE AGX Xavier General driveos-dl	11	1700	October 12, 2021
Error installing TensorRT Cross-Compile for Linux DRIVE AGX Xavier General drive-platform-setup	25	2850	November 4, 2021
Missing "libnvinfer-dev" after installing DRIVE OS 5.2 on DRIVE AGX DRIVE AGX Xavier General drive-platform-setup	12	614	October 12, 2021
TensorRT, CuDNN Libraries MATLAB GPU coder DRIVE AGX Xavier General driveos-dl	6	1009	April 27, 2022
TensorRT upgrade DRIVE AGX Xavier General driveos-dl	15	1204	April 27, 2022
Error in executing TensorRT samples through docker container environment DRIVE AGX Orin General docker , driveos-dl	14	97	October 24, 2024
TensorRT installation on Drive AGX DRIVE AGX Xavier General drive-platform-setup	12	1251	October 12, 2021
TensorRT, Drive AGX, Jetson and the .onnx format DRIVE AGX Xavier General driveos-dl	6	813	October 12, 2021
TensorRT 5.0.2.6 onnx run error TensorRT	17	3341	October 12, 2021

DRIVE AGX: Fail to run sampleOnnxMNIST with TensorRT C++ API

Related topics