Error in Deserialization of TRT Engine: TLT 2, DeepStream SDK 5, YOLOv3, Jetson Nano

Hello,

Using TLT 2 running in a Docker container on a 1080TI, I trained a YOLOv3 model for deployment in DeepStream SDK 5 on Jetson Nano. I used quantization-aware training and the following export command in the Docker image:

!tlt-export yolo -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolo_resnet18_epoch_$EPOCH.tlt
-o $USER_EXPERIMENT_DIR/qat_int8_export/yolo_resnet18_epoch_$EPOCH.etlt
-e $SPECS_DIR/yolo_retrain_resnet18_kitti.txt
-k $KEY
–cal_image_dir $USER_EXPERIMENT_DIR/data/valid/images
–data_type int8
–batch_size 16
–batches 10
–cal_cache_file $USER_EXPERIMENT_DIR/qat_int8_export/cal.bin
–cal_data_file $USER_EXPERIMENT_DIR/export2/cal.tensorfile

With the resulting etlt and bin files transferred to my Jetson Nano, I ran the following engine generation script on the Jetson Nano:

~/Downloads/tlt-converter -k
-d 3,512,512
-o BatchedNMS
-c ~/Downloads/int8_export/export2/cal.bin
-e ~/Downloads/int8_export/drone_yolov3.plan
-m 16
-t int8
~/Downloads/int8_export/export2/yolo_resnet18_epoch_003.etlt

This generated the engine plan file. Having run make in the YOLO custom parser directory, I then modified the deepstream configuration and primary inference files, which are attached.

When I run deepstream, I get the following error:

Warning: ‘input-dims’ parameter has been deprecated. Use ‘infer-dims’ instead.
Warn: ‘threshold’ parameter has been deprecated. Use ‘pre-cluster-threshold’ instead.
Opening in BLOCKING MODE
gstnvtracker: Loading low-level lib at /opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
gstnvtracker: Optional NvMOT_RemoveStreams not implemented
gstnvtracker: Batch processing is OFF
gstnvtracker: Past frame output is OFF
0:00:13.674351110 14579 0x43a6400 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1701> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/drone_yolov3.plan
INFO: [Implicit Engine Info]: layers num: 5
0 INPUT kFLOAT Input 3x512x512
1 OUTPUT kINT32 BatchedNMS 0
2 OUTPUT kFLOAT BatchedNMS_1 200x4
3 OUTPUT kFLOAT BatchedNMS_2 200
4 OUTPUT kFLOAT BatchedNMS_3 200

0:00:13.674637466 14579 0x43a6400 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1805> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/drone_yolov3.plan
0:00:13.801591144 14579 0x43a6400 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/MOD_config_infer_primary_yoloV3.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

p: Pause
r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.

**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:181>: Pipeline ready

** INFO: <bus_callback:167>: Pipeline running

NvMMLiteOpen : Block : BlockType = 4
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
ERROR: yoloV3 output layer.size: 4 does not match mask.size: 3
0:00:17.325440295 14579 0x3e49a80 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:725> [UID = 1]: Failed to parse bboxes using custom parse function

It seems odd that the engine deserializes to only five layers. Regardless, why is this error occurring? Note that I have tried this in FP16 and gotten the same result.MOD_config_infer_primary_yoloV3.txt (3.2 KB) MOD_deepstream_app_config_yoloV3.txt (3.8 KB)