Tao-converter export int8 engine core dump on Xavier NX jetpack 4.6

Dear forum,

I’m trying to use the gorgeous peoplesegnet in it’s fastest form on Xavier NX so with an int8 precision. In facts I use jetpack 4.6-b199 with tao-converter jetpack specific version (tao-converter-jp46-trt8.0.1.6). I’m facing a problem to generate the tensorRT .plan file with this specific precision with tao-converter failing with a “core dump”.

Failing command

As the original file has been exported to int8 precision I thought it would be easy to have it done so I tried this:

$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_int8.plan -o generate_detections,mask_fcn_logits/BiasAdd -t int8 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 4679 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 519 MiB, GPU 4853 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +294, now: CPU 751, GPU 5147 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +420, now: CPU 1058, GPU 5567 (MiB)
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 19968
[INFO] Total Device Persistent Memory: 0
[INFO] Total Scratch Memory: 432907776
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1225 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1538 MiB, GPU 6715 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation end: CPU 1540 MiB, GPU 7448 MiB
[INFO] Starting Calibration with batch size 8.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1540, GPU 7453 (MiB)
[INFO]   Post Processing Calibration data in 1.8624e-05 seconds.
[ERROR] 1: Unexpected exception _Map_base::at
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Do you see something wrong in there?

Succeeding commands

The thing that frustrates me is that both fp16 and fp32 are not crashing at all…

$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_fp16.plan -o generate_detections,mask_fcn_logits/BiasAdd -t fp16 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
...
$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test.plan -o generate_detections,mask_fcn_logits/BiasAdd -w 2000000000 ./peoplesegnet_resnet50_int8.etlt

System information

Here are some more information to help tracking the issue

lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.5 LTS
Release:	18.04
Codename:	bionic
apt show nvidia-jetpack 
Package: nvidia-jetpack
Version: 4.6-b199
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<32.7-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29,4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.6/main arm64 Packages
Description: NVIDIA Jetpack Meta Package

Thanks a lot in advance
Best regards.

Could you share the link for the tao-converter? I cannot open it.

Did you try TensorRT — TAO Toolkit 3.21.11 documentation ?

I just edited my first post to change the link it is actually https://developer.nvidia.com/tao-converter-jp4.6 from table TensorRT — TAO Toolkit 3.21.11 documentation

I focused my trials on TAO as I was interested in this complete framework and I used the documentation you linked as a base for those trials.

Please find enclosed some more information

$ ldd /home/yumain/bin/tao-converter
	linux-vdso.so.1 (0x0000007f9701a000)
	libgtk3-nocsd.so.0 => /usr/lib/aarch64-linux-gnu/libgtk3-nocsd.so.0 (0x0000007f96f92000)
	libcrypto.so.1.1 => /usr/lib/aarch64-linux-gnu/libcrypto.so.1.1 (0x0000007f96d51000)
	libnvinfer.so.8 => /usr/lib/aarch64-linux-gnu/libnvinfer.so.8 (0x0000007f8b7b3000)
	libnvonnxparser.so.8 => /usr/lib/aarch64-linux-gnu/libnvonnxparser.so.8 (0x0000007f8b517000)
	libnvparsers.so.8 => /usr/lib/aarch64-linux-gnu/libnvparsers.so.8 (0x0000007f8b1d7000)
	libnvinfer_plugin.so.8 => /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8 (0x0000007f89d57000)
	libstdc++.so.6 => /usr/lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f89bc3000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f89b9f000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f89a46000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f96fee000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f89a31000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f89a05000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f899ee000)
	libnvdla_compiler.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so (0x0000007f895e3000)
	libEGL.so.1 => /usr/lib/aarch64-linux-gnu/libEGL.so.1 (0x0000007f895c2000)
	libnvmedia.so => /usr/lib/aarch64-linux-gnu/tegra/libnvmedia.so (0x0000007f89557000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f8949e000)
	libcublas.so.10 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublas.so.10 (0x0000007f84736000)
	libcublasLt.so.10 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcublasLt.so.10 (0x0000007f82720000)
	libcudart.so.10.2 => /usr/local/cuda-10.2/targets/aarch64-linux/lib/libcudart.so.10.2 (0x0000007f82698000)
	libcudnn.so.8 => /usr/lib/aarch64-linux-gnu/libcudnn.so.8 (0x0000007f8265f000)
	libnvos.so => /usr/lib/aarch64-linux-gnu/tegra/libnvos.so (0x0000007f82641000)
	libGLdispatch.so.0 => /usr/lib/aarch64-linux-gnu/libGLdispatch.so.0 (0x0000007f82515000)
	libnvrm.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm.so (0x0000007f824d2000)
	libnvrm_graphics.so => /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so (0x0000007f824b2000)
	libnvdc.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdc.so (0x0000007f82492000)
	libnvtvmr.so => /usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so (0x0000007f82402000)
	libnvparser.so => /usr/lib/aarch64-linux-gnu/tegra/libnvparser.so (0x0000007f823c6000)
	libnvdla_runtime.so => /usr/lib/aarch64-linux-gnu/tegra/libnvdla_runtime.so (0x0000007f82328000)
	libnvimp.so => /usr/lib/aarch64-linux-gnu/tegra/libnvimp.so (0x0000007f82313000)

and the checksum for libnvinfer_plugin.so.8 is b3376b346871f2588ba7fc07ada50147 (web sourcedhttps://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/TRT-OSS/Jetson)
and I also tried with own build one without success.

Can you try "-w 1000000000 "?

Done.
I first tried without setting the workspace size but neither with 2000000000, 1000000000 or not option lead to the same core dump.

Could you try enable --strict_data_type ?

Not sure if below known limitation also happens in Xavier.
https://docs.nvidia.com/tao/tao-toolkit/text/release_notes.html#known-issues-limitations

I think I have found an half solution for the problem.

The -c option I did not specified (as it is not marked as mandatory and as I was exporting an int8 .etlt to an int8 .plan) should be specified.

I tried this command and I got a much longer core dumped

tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_int8.plan -o generate_detections, mask_fcn_logits/BiasAdd -c ./peoplesegnet_resnet50_int8.txt -t int8 -b 1 -m 1 ./peoplesegnet_resnet50_int8.etlt
...
[WARNING] Skipping tactic 8 due to oom error on requested size of 1818 detected for tactic 60.
[ERROR] Tactic Device request: 1814MB Available: 1536MB. Device memory is insufficient to use tactic.
[WARNING] Skipping tactic 3 due to oom error on requested size of 1814 detected for tactic 4.
[ERROR] Tactic Device request: 1814MB Available: 1536MB. Device memory is insufficient to use tactic.
[WARNING] Skipping tactic 7 due to oom error on requested size of 1814 detected for tactic 60.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1577, GPU 4649 (MiB)
[ERROR] 1: Unexpected exception std::bad_alloc
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Maybe at least a note in the documentation, a runtime warning or an exception raise can help

By the way it still goes into segfault but another one.

I’m trying with the -s as --strict_data_type is not recognized hoping you got the second half… keep you posted

And here we are! The following command finally succeeded in generating an int8 engine for Xavier NX under jetpack 4.6.

$ tao-converter -k nvidia_tlt -d 3,576,960 -e./model_int8.plan -o generate_detections, mask_fcn_logits/BiasAdd -c ./peoplesegnet_resnet50_int8.txt -t int8 -b 1 -m 1 -w 2000000000 -s ./peoplesegnet_resnet50_int8.etlt

Thanks for the help @Morganh

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.