Issue running tlt trained SSD-resnet18 on Xavier with deepstream-app

I trained an SSD with resnet18 backbone and my customized dataset by the notebook tutorial. I used tlt-converter on Xavier to calibrate the .etlt with generating a trt engine of int8 quantization. Then I deployed it with deepstream-app. Here is what I saw on the display:


The bounding boxes are scattered all around the frame and these messy bounding boxes are relatively all fixed at their location and did not move around when objects passed by.

I tried the FP32 and the detection doesn’t ever look like this and bounding boxes are all around the specifc objects. Seems to me, the error happens at int8 calibration with tlt-converter:

 ./tlt-converter ~/Desktop/new/ssd_resnet18_epoch_025.etlt              
                 -c ~/Desktop/new/cal.bin             
                 -t int8              
                 -e ~/Desktop/new/trtb.engine              
                 -k MmJnZzFnM21xdXBmZ2l2MHRiY3VmYTNibzg6YjAyY2FlNjctOTk3ZC00NDkzLTkyNjItNTVjZGMzODE0Mjcx            
                 -d 3,512,512             
                 -o NMS

Here are my deepstream config files for your reference:

config_infer_primary_miotcd.txt (1.3 KB)
deepstream_app_source1_miotcd.txt (2.5 KB)

Thanks!

Could you refer to below config file at https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/blob/master/pgie_ssd_tlt_config.txt

output-blob-names=NMS
parse-bbox-func-name=NvDsInferParseCustomSSDTLT
custom-lib-path=./nvdsinfer_customparser_ssd_tlt/libnvds_infercustomparser_ssd_tlt.so

Seems that your config file is not correct.

Thank you! I changed the config by following your instruction. But it still doesn’t work and the messy bounding boxes are still there in the fixed location. A confusing fact I just found out is that there isn’t any bounding box at all when I set the threshold=0.4 in config_infer_primary_miotcd.txt as

 [class-attrs-all]
threshold=0.4
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

But when I decrease the threshold from 0.4 to 0.3 to 0.2 and even 0.1, there still isn’t a single bounding box on display as the pic below.

When the threshold is set to 0.09 and continue to reduce it to 0.08, 0.07,0.06…0.01, the messy bounding boxes with fixed locations start to appear more and more on display with the decrease of the threshold smaller than 0.09.

Firstly, please check your inference result is good with the tool tlt-infer. Run it against one jpg/png file.
We want to know if your model can work well.
If you do not know how to use the tlt-infer, please reach help from jupyter notebook.

I run the tlt-infer again with the following command:

tlt-infer ssd -i $DATA_DOWNLOAD_DIR/training/infer_2 \
               -o $USER_EXPERIMENT_DIR/ssd_infer_iti \
               -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
               -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_025.tlt \
               -l $USER_EXPERIMENT_DIR/ssd_inter_iti_label \
               -k $KEY

And followings are some samples of results:

I think the result for inference for still images works fine with tlt-infer. And the deployment of the FP32 model on videos also works reasonably on Xavier.

I have a question. Should tlt-export be also hardware specific? For INT8, my .etltand cal.bin is generated by tlt-export on my training hardware(Titan XP). My tlt-export command runs as following on my training hardware:

tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_resnet18_epoch_025.tlt  \
                -o $USER_EXPERIMENT_DIR/exportt/ssd_resnet18_epoch_025.etlt \
                -e $SPECS_DIR/ssd_train_resnet18_kitti.txt \
                -k $KEY \
                --cal_image_dir  $USER_EXPERIMENT_DIR/data/training/cal_2 \
                --data_type int8 \
                --batch_size 1 \
                --batches 5000 \
                --cal_cache_file $USER_EXPERIMENT_DIR/exportt/cal.bin  \
                --cal_data_file $USER_EXPERIMENT_DIR/exportt/cal.tensorfile 

And then I copy the .etlt and cal.bin to my deploying hardware(Jetson Xavier). And then I use Jetson’s version of tlt-converter to have it calibrate into INT8 trt.engine. Is it a reasonable workflow in your view?

Thanks!

Yes, your workflow is fine.
According to your tlt-infer result, there should be something wrong in deploying model with deepstream.

Please double check according to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation.
Suggesting using GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream ssd part directly.
Replace your model,cal.bin , etc .

BTW, your config is missing

offsets=103.939;116.779;123.68

Thanks! But it still doesn’t work. I used the same config files for FP32 deploying on video streams and it works reasonably well. So I think my deepstream-app config is okay but I suspect there is some issue with int8 calibration by tlt-converter on Xavier. I used the following command:

./tlt-converter ~/Desktop/new/ssd_resnet18_epoch_025.etlt \          
                -c ~/Desktop/new/calb.bin \               
                -t int8 \               
                -e ~/Desktop/new/trtb.engine \              
                -k MmJnZzFnM21xdXBmZ2l2MHRiY3VmYTNibzg6YjAyY2FlNjctOTk3ZC00NDkzLTkyNjItNTVjZGMzODE0Mjcx \               
                -d 3,512,512 \               
                -o NMS

And here is my log:

[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[INFO] 
[INFO] --------------- Layers running on DLA: 
[INFO] 
[INFO] --------------- Layers running on GPU: 
[INFO] conv1/convolution + activation_3/Relu, block_1a_conv_1/convolution + block_1a_relu_1/Relu, block_1a_conv_2/convolution, block_1a_conv_shortcut/convolution + add_17/add + block_1a_relu/Relu, block_1b_conv_1/convolution + block_1b_relu_1/Relu, block_1b_conv_2/convolution, block_1b_conv_shortcut/convolution + add_18/add + block_1b_relu/Relu, block_2a_conv_1/convolution + block_2a_relu_1/Relu, block_2a_conv_2/convolution, block_2a_conv_shortcut/convolution + add_19/add + block_2a_relu/Relu, block_2b_conv_1/convolution + block_2b_relu_1/Relu, block_2b_conv_2/convolution, block_2b_conv_shortcut/convolution + add_20/add + block_2b_relu/Relu, ssd_conf_0/convolution, block_3a_conv_1/convolution + block_3a_relu_1/Relu, ssd_loc_0/convolution, ssd_anchor_0/Const, block_3a_conv_2/convolution, FirstDimTile_0, block_3a_conv_shortcut/convolution + add_21/add + block_3a_relu/Relu, block_3b_conv_1/convolution + block_3b_relu_1/Relu, block_3b_conv_2/convolution, block_3b_conv_shortcut/convolution + add_22/add + block_3b_relu/Relu, block_4a_conv_1/convolution + block_4a_relu_1/Relu, block_4a_conv_2/convolution, block_4a_conv_shortcut/convolution + add_23/add + block_4a_relu/Relu, block_4b_conv_1/convolution + block_4b_relu_1/Relu, block_4b_conv_2/convolution, block_4b_conv_shortcut/convolution + add_24/add + block_4b_relu/Relu, ssd_expand_block_0_conv_0/convolution + ssd_expand_block_0_relu_0/Relu, ssd_expand_block_0_conv_1/convolution + ssd_expand_block_0_relu_1/Relu, ssd_conf_1/convolution, ssd_expand_block_1_conv_0/convolution + ssd_expand_block_1_relu_0/Relu, ssd_loc_1/convolution, ssd_anchor_1/Const, ssd_expand_block_1_conv_1/convolution + ssd_expand_block_1_relu_1/Relu, FirstDimTile_1, ssd_conf_2/convolution, ssd_expand_block_2_conv_0/convolution + ssd_expand_block_2_relu_0/Relu, ssd_loc_2/convolution, ssd_anchor_2/Const, ssd_expand_block_2_conv_1/convolution + ssd_expand_block_2_relu_1/Relu, FirstDimTile_2, ssd_conf_3/convolution, ssd_expand_block_3_conv_0/convolution + ssd_expand_block_3_relu_0/Relu, ssd_loc_3/convolution, ssd_anchor_3/Const, ssd_expand_block_3_conv_1/convolution + ssd_expand_block_3_relu_1/Relu, FirstDimTile_3, ssd_conf_4/convolution, ssd_expand_block_4_conv_0/convolution + ssd_expand_block_4_relu_0/Relu, ssd_loc_4/convolution, ssd_anchor_4/Const, ssd_expand_block_4_conv_1/convolution + ssd_expand_block_4_relu_1/Relu, FirstDimTile_4, ssd_conf_5/convolution, ssd_loc_5/convolution, ssd_anchor_5/Const, FirstDimTile_5, anchor_reshape_0/Reshape, anchor_reshape_1/Reshape, anchor_reshape_2/Reshape, anchor_reshape_3/Reshape, anchor_reshape_4/Reshape, anchor_reshape_5/Reshape, anchor_reshape/Reshape + anchor_permute/transpose + (Unnamed Layer* 538) [Shuffle], anchor_data/Reshape, permute_26/transpose + (Unnamed Layer* 547) [Shuffle] + loc_reshape_0/Reshape, permute_28/transpose + (Unnamed Layer* 556) [Shuffle] + loc_reshape_1/Reshape, permute_30/transpose + (Unnamed Layer* 565) [Shuffle] + loc_reshape_2/Reshape, permute_32/transpose + (Unnamed Layer* 574) [Shuffle] + loc_reshape_3/Reshape, permute_34/transpose + (Unnamed Layer* 583) [Shuffle] + loc_reshape_4/Reshape, permute_36/transpose + (Unnamed Layer* 592) [Shuffle] + loc_reshape_5/Reshape, loc_data/Reshape, permute_25/transpose + (Unnamed Layer* 613) [Shuffle] + conf_reshape_0/Reshape, permute_27/transpose + (Unnamed Layer* 626) [Shuffle] + conf_reshape_1/Reshape, permute_29/transpose + (Unnamed Layer* 639) [Shuffle] + conf_reshape_2/Reshape, permute_31/transpose + (Unnamed Layer* 652) [Shuffle] + conf_reshape_3/Reshape, permute_33/transpose + (Unnamed Layer* 665) [Shuffle] + conf_reshape_4/Reshape, permute_35/transpose + (Unnamed Layer* 678) [Shuffle] + conf_reshape_5/Reshape, conf_data/Reshape, NMS, 
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.

Thank you

Can you paste the latest config file when your run int8 ?

deepstream_app_source1_miotcd.txt (2.6 KB)
config_infer_primary_miotcd.txt (1.3 KB)

Here are the latest configs for running deepstream-app command. Thank you.

You mention that it works well with FP32 mode, right? Can you confirm?
If yes, could you paste the config files too? Thanks.

Yes. The FP32 models work pretty well. Here are the configs.

config_infer_primary_miotcd.txt (1.3 KB)
deepstream_app_source1_miotcd.txt (2.6 KB)

Thank you!

It’s solved by using the engine file generated from deepstream-custom in deepstream-tlt app, other than tlt-export and tlt-convertercommands.

Thanks for the info. BTW, what is the version of your tlt docker and deepstream?

The deepstream is 5.0 and tlt docker is tlt-streamanalytics:v2.0_dp_py2. Thanks!