Yolo V4 inference input size in DeepStream 5 after tlt-export

Hi all,

I have been training Yolo V4 with the CSPDarknet53 backbone based on the sample Jupyter Notebook provided with TLT 3.0. I could get decent accuracy and I’m quite happy with the trained model in TLT.

In order to set the input dimensions of the network to (1024x768), I used the following augmentation_config section in the TLT specification file:

augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip: 0.5
  jitter: 0.3
  output_width: 1024
  output_height: 768
  randomize_input_shape_period: 0
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}

After the training and visualising the inference with TLT on some test images to assess the model, I exported an etlt file using tlt-export to use it with DeepStream 5.1.

Here is the export command:

!tlt yolo_v4 export -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned_darknet53_nofreeze_norelu_dataok/weights/yolov4_cspdarknet53_epoch_$EPOCH.tlt \
                    -k $KEY \
                    -o $USER_EXPERIMENT_DIR/export/yolov4_cspdarknet53_epoch_$EPOCH_b1.etlt \
                    -e $SPECS_DIR/yolo_v4_retrain_resnet18_kitti.txt \
                    --batch_size 1 \
                    --data_type fp32

So in the DeepStream PGIE configuration file, in the property section, I must use inference dims of 3x384x1248 otherwise the application is crashing (wrong dimensions).

[property]
gpu-id=0
offsets=103.939;116.779;123.68
net-scale-factor=1
#0=RGB, 1=BGR
model-color-format=1
tlt-encoded-model=../models/model.etlt
tlt-model-key=nvidia_tlt
labelfile-path=../models/labels.txt
infer-dims=3;384;1248
tlt-encoded-model=../models/model.etlt
tlt-model-key=<some_encoding>
labelfile-path=../models/labels.txt
infer-dims=3;384;1248
uff-input-order=0
uff-input-blob-name=Input
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
network-type=0
num-detected-classes=15
is-classifier=0
maintain-aspect-ratio=0
output-blob-names=BatchedNMS
cluster-mode=3
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
custom-lib-path=../lib/post_processor/libnvds_infercustomparser_tlt.so

3x384x1248 seems to be the default in TLT for Yolo V4, but I thought I changed that by updating the augmentation_config section.

So how can I force DeepStream to use 3x1024x768 as inference dimension that has been used during the training?

As a side question, how can we choose the values for offsets in the [property] section?

Thanks,

Johan

It is not expected. For your case, the infer-dims needs to be 3;768;1024

It is the value of preprocessing. Please not change it.

That’s my problem: if I set it up to 3;768;1024 it crashes.

It seems that the network is stuck to input size of 3x384x1248.

Here is the message I got when I set infer-dims=3;768;1024

INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 5
0   INPUT  kFLOAT Input           3x384x1248      
1   OUTPUT kINT32 BatchedNMS      0               
2   OUTPUT kFLOAT BatchedNMS_1    200x4           
3   OUTPUT kFLOAT BatchedNMS_2    200             
4   OUTPUT kFLOAT BatchedNMS_3    200  

....
ERROR: tlt/tlt_decode.cpp:274 failed to build network since parsing model errors.
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:797 Failed to create network using custom network creation function
ERROR: ../nvdsinfer/nvdsinfer_model_builder.cpp:862 Failed to get cuda engine from custom library API
0:00:04.171234278 19485 0x564d60618a10 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1735> [UID = 1]: build engine file failed       

Have you set tlt-encoded-model to your trained model?

Yes

Deepstream will not stick to the input-size.

Suggest you to try run GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream , there are sample models which input-size is 960x544.
If run successfully, then replace with your trained model.

That’s what I understood as well.

I’ll give a try to the sample app.

Thanks,

Johan

Hi again,

I could run successfully the sample app with the input-size 960x544.

I plugged my model, and still had the same issue: DeepStream only runs if I use an input size of 3x384x1248. It crashes with 3x768x1024. I also tried 3x1024x768 just to be sure I did not swapped the dimensions.

Here is some of additional info: backbone is CSPDarknet53, all images have been resized to 1024x768 before training, and the data_augmentation section has been set as previously mentioned. The anchor shapes have also been generated using the 1024x768 size as well.

I just checked the training output, and the input layer has the correct size:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (None, 3, 768, 1024) 0                                            
__________________________________________________________________________________________________

Thanks,

Johan

Can you double check the config file in deepstream? I can see some extra lines. For example, there are two tlt-encoded-model

The config file is correct (I made a mistake when copy and paste in the original post, hence the extra lines).

Here is the config file:

[property]
gpu-id=0
offsets=103.939;116.779;123.68
net-scale-factor=1
#0=RGB, 1=BGR
model-color-format=1
tlt-encoded-model=../models/model.etlt
tlt-model-key=nvidia_tlt
labelfile-path=../models/labels.txt
infer-dims=3;384;1248
#infer-dims=3;768;1024
uff-input-order=0
uff-input-blob-name=Input
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
network-type=0
num-detected-classes=15
is-classifier=0
maintain-aspect-ratio=0
output-blob-names=BatchedNMS
cluster-mode=3
parse-bbox-func-name=NvDsInferParseCustomBatchedNMSTLT
custom-lib-path=../lib/post_processor/libnvds_infercustomparser_tlt.so

There is no other etlt file in the app directory.

I know it is very unlikely, but could the issue be from the tlt yolo_v4 export step?

No, it will not.
Actually this is the first time I meet this kind of issue from users.
Some tips:

  1. Run tlt yolo_v4 evaluate , in the log, to check you indeed train a tlt model with 1024x768
  2. Run tlt-converter to generate trt engine based on your etlt model. And set this engine in config file of deepstream. model-engine-file= your-trt-engine
    In this way, comment out tlt-encoded-model and tlt-model-key

I solved my issue: it was actually in the export step.

But that was my mistake the path to the spec file was not correct.

So it is resolved :).

Thanks for the help and sorry for the inconvenience.

Cheers!

1 Like

Hi there!
Great Work! Would you be able to help me?
I have trained the yolo_v4 using TLT 3.0 using the sample notebook with the provided data. Now am trying to infer using the exported model in DeepStream 5.1. I am having a hard time in understand the config file. Due to that am unable to perform Inference and face errors. Would you be kind enough to provide the simplest config file for a yolo_v4 model trained on the provided data?
Thank you

See GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream and https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps/tree/master/configs/yolov4_tlt