Tao-converter export int8 engine core dump on Xavier NX jetpack 4.6

olivier.brousse1 · January 20, 2022, 9:23pm

Dear forum,

I’m trying to use the gorgeous peoplesegnet in it’s fastest form on Xavier NX so with an int8 precision. In facts I use jetpack 4.6-b199 with tao-converter jetpack specific version (tao-converter-jp46-trt8.0.1.6). I’m facing a problem to generate the tensorRT .plan file with this specific precision with tao-converter failing with a “core dump”.

Failing command

As the original file has been exported to int8 precision I thought it would be easy to have it done so I tried this:

$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_int8.plan -o generate_detections,mask_fcn_logits/BiasAdd -t int8 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 4679 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 519 MiB, GPU 4853 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +294, now: CPU 751, GPU 5147 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +420, now: CPU 1058, GPU 5567 (MiB)
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 19968
[INFO] Total Device Persistent Memory: 0
[INFO] Total Scratch Memory: 432907776
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1225 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1538 MiB, GPU 6715 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation end: CPU 1540 MiB, GPU 7448 MiB
[INFO] Starting Calibration with batch size 8.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1540, GPU 7453 (MiB)
[INFO]   Post Processing Calibration data in 1.8624e-05 seconds.
[ERROR] 1: Unexpected exception _Map_base::at
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Do you see something wrong in there?

Succeeding commands

The thing that frustrates me is that both fp16 and fp32 are not crashing at all…

$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_fp16.plan -o generate_detections,mask_fcn_logits/BiasAdd -t fp16 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
...
$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test.plan -o generate_detections,mask_fcn_logits/BiasAdd -w 2000000000 ./peoplesegnet_resnet50_int8.etlt

System information

Here are some more information to help tracking the issue

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic

apt show nvidia-jetpack 
Package: nvidia-jetpack
Version: 4.6-b199
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<32.7-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29,4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.6/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Thanks a lot in advance
Best regards.

Morganh · January 21, 2022, 8:35am

Could you share the link for the tao-converter? I cannot open it.

Did you try https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#installing-the-tao-converter ?

olivier.brousse1 · January 21, 2022, 9:12am

I just edited my first post to change the link it is actually https://developer.nvidia.com/tao-converter-jp4.6 from table https://docs.nvidia.com/tao/tao-toolkit/text/tensorrt.html#id5

I focused my trials on TAO as I was interested in this complete framework and I used the documentation you linked as a base for those trials.

olivier.brousse1 · January 21, 2022, 9:39am

Please find enclosed some more information

$ ldd /home/yumain/bin/tao-converter
	linux-vdso.so.1 (0x0000007f9701a000)
	libgtk3-nocsd.so.0 => /usr/lib/aarch64-linux-gnu/libgtk3-nocsd.so.0 (0x0000007f96f92000)
	libcrypto.so.1.1 => /usr/lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000007f96d51000)
	libnvinfer.so.8 => /usr/lib/aarch64-linux-gnu/libnvinfer.so.8 (0x0000007f8b7b3000)
	libnvonnxparser.so.8 => /usr/lib/aarch64-linux-gnu/libnvonnxparser.so.8 (0x0000007f8b517000)
	libnvparsers.so.8 => /usr/lib/aarch64-linux-gnu/libnvparsers.so.8 (0x0000007f8b1d7000)
	libnvinfer_plugin.so.8 => /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8 (0x0000007f89d57000)
	libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f89bc3000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f89b9f000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f89a46000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f96fee000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f89a31000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f89a05000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f899ee000)
	libnvdla_compiler.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so (0x0000007f895e3000)
	libEGL.so.1 => /usr/lib/aarch64-linux-gnu/libEGL.so.1 (0x0000007f895c2000)
	libnvmedia.so => /usr/lib/aarch64-linux-gnu/tegra/libnvmedia.so (0x0000007f89557000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f8949e000)
	libcublas.so.10 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublas.so.10 (0x0000007f84736000)
	libcublasLt.so.10 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublasLt.so.10 (0x0000007f82720000)
	libcudart.so.10.2 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcudart.so.10.2 (0x0000007f82698000)
	libcudnn.so.8 => /usr/lib/aarch64-linux-gnu/libcudnn.so.8 (0x0000007f8265f000)
	libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000007f82641000)
	libGLdispatch.so.0 => /usr/lib/aarch64-linux-gnu/libGLdispatch.so.0 (0x0000007f82515000)
	libnvrm.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so (0x0000007f824d2000)
	libnvrm_graphics.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so (0x0000007f824b2000)
	libnvdc.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdc.so (0x0000007f82492000)
	libnvtvmr.so => /usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so (0x0000007f82402000)
	libnvparser.so => /usr/lib/aarch64-linux-gnu/tegra/libnvparser.so (0x0000007f823c6000)
	libnvdla_runtime.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdla_runtime.so (0x0000007f82328000)
	libnvimp.so => /usr/lib/aarch64-linux-gnu/tegra/libnvimp.so (0x0000007f82313000)

and the checksum for libnvinfer_plugin.so.8 is b3376b346871f2588ba7fc07ada50147 (web sourcedhttps://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/TRT-OSS/Jetson)
and I also tried with own build one without success.

Morganh · January 21, 2022, 10:37am

Can you try "-w 1000000000 "?

olivier.brousse1 · January 21, 2022, 11:17am

Done.
I first tried without setting the workspace size but neither with 2000000000, 1000000000 or not option lead to the same core dump.

Morganh · January 21, 2022, 12:09pm

Could you try enable --strict_data_type ?

Not sure if below known limitation also happens in Xavier.
https://docs.nvidia.com/tao/tao-toolkit/text/release_notes.html#known-issues-limitations

olivier.brousse1 · January 21, 2022, 12:43pm

I think I have found an half solution for the problem.

The -c option I did not specified (as it is not marked as mandatory and as I was exporting an int8 .etlt to an int8 .plan) should be specified.

I tried this command and I got a much longer core dumped

tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_int8.plan -o generate_detections, mask_fcn_logits/BiasAdd -c ./peoplesegnet_resnet50_int8.txt -t int8 -b 1 -m 1 ./peoplesegnet_resnet50_int8.etlt
...
[WARNING] Skipping tactic 8 due to oom error on requested size of 1818 detected for tactic 60.
[ERROR] Tactic Device request: 1814MB Available: 1536MB. Device memory is insufficient to use tactic.
[WARNING] Skipping tactic 3 due to oom error on requested size of 1814 detected for tactic 4.
[ERROR] Tactic Device request: 1814MB Available: 1536MB. Device memory is insufficient to use tactic.
[WARNING] Skipping tactic 7 due to oom error on requested size of 1814 detected for tactic 60.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1577, GPU 4649 (MiB)
[ERROR] 1: Unexpected exception std::bad_alloc
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Maybe at least a note in the documentation, a runtime warning or an exception raise can help

By the way it still goes into segfault but another one.

I’m trying with the -s as --strict_data_type is not recognized hoping you got the second half… keep you posted

olivier.brousse1 · January 21, 2022, 12:48pm

And here we are! The following command finally succeeded in generating an int8 engine for Xavier NX under jetpack 4.6.

$ tao-converter -k nvidia_tlt -d 3,576,960 -e./model_int8.plan -o generate_detections, mask_fcn_logits/BiasAdd -c ./peoplesegnet_resnet50_int8.txt -t int8 -b 1 -m 1 -w 2000000000 -s ./peoplesegnet_resnet50_int8.etlt

Thanks for the help @Morganh

system · February 8, 2022, 3:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao-converter mask_rcnn int8 engine creation fails TAO Toolkit	11	1366	November 2, 2021
Failed to convert TensorRT engine using Tao deploy library TAO Toolkit	7	416	November 7, 2023
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2038	February 15, 2022
Tao-converter error TAO Toolkit	34	1963	November 10, 2021
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	1288	September 11, 2023
Issue while converting maskrcnn model to trt from etlt on Laptops TAO Toolkit tensorrt , tao	23	1391	June 10, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1668	July 10, 2023
LPRNet can't use exported engine file TAO Toolkit	18	2505	December 28, 2021
Tao-converter failed to convert etlt to engine file due to could not find any implementation for node conv1/convolution + activate_1/Relu6 TAO Toolkit	9	836	April 26, 2022
GazeNet - Tao_converter [ERROR] input_left_images:0: number of dimensions is 4 but profile 0 has 3 TAO Toolkit	5	327	July 12, 2023

Tao-converter export int8 engine core dump on Xavier NX jetpack 4.6

Failing command

Succeeding commands

System information

Related topics