Dear forum,
I’m trying to use the gorgeous peoplesegnet in it’s fastest form on Xavier NX so with an int8 precision. In facts I use jetpack 4.6-b199 with tao-converter jetpack specific version (tao-converter-jp46-trt8.0.1.6). I’m facing a problem to generate the tensorRT .plan file with this specific precision with tao-converter failing with a “core dump”.
Failing command
As the original file has been exported to int8 precision I thought it would be easy to have it done so I tried this:
$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_int8.plan -o generate_detections,mask_fcn_logits/BiasAdd -t int8 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 4679 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 519 MiB, GPU 4853 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +294, now: CPU 751, GPU 5147 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU +420, now: CPU 1058, GPU 5567 (MiB)
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 19968
[INFO] Total Device Persistent Memory: 0
[INFO] Total Scratch Memory: 432907776
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1225 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1519, GPU 6691 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1538 MiB, GPU 6715 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1538, GPU 6715 (MiB)
[INFO] [MemUsageSnapshot] ExecutionContext creation end: CPU 1540 MiB, GPU 7448 MiB
[INFO] Starting Calibration with batch size 8.
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1540, GPU 7453 (MiB)
[INFO] Post Processing Calibration data in 1.8624e-05 seconds.
[ERROR] 1: Unexpected exception _Map_base::at
[ERROR] Unable to create engine
Segmentation fault (core dumped)
Do you see something wrong in there?
Succeeding commands
The thing that frustrates me is that both fp16 and fp32 are not crashing at all…
$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test_fp16.plan -o generate_detections,mask_fcn_logits/BiasAdd -t fp16 -s -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
...
$ tao-converter -k nvidia_tlt -d 3,576,960 -e ./model_test.plan -o generate_detections,mask_fcn_logits/BiasAdd -w 2000000000 ./peoplesegnet_resnet50_int8.etlt
System information
Here are some more information to help tracking the issue
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic
apt show nvidia-jetpack
Package: nvidia-jetpack
Version: 4.6-b199
Priority: standard
Section: metapackages
Maintainer: NVIDIA Corporation
Installed-Size: 199 kB
Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<32.7-0)
Homepage: http://developer.nvidia.com/jetson
Download-Size: 29,4 kB
APT-Sources: https://repo.download.nvidia.com/jetson/t194 r32.6/main arm64 Packages
Description: NVIDIA Jetpack Meta Package
Thanks a lot in advance
Best regards.