Finding inaccurate result while testing model(TLT trained model) with deepstream

Hello sir,
I had try to trained and deployment of model with the help of using TLT and deepstream but i cannot able
to get boxes around the object.

Steps :

→ follow the process as provided in TLT container.

→ I had use multi-gpu training script for training the model.The number of GPUs which i used will 8.

→ I can able to generate “resnet18_detector.trt”, “calibration.tensor”, “resnet18_detector.etlt”, “calibration.bin” all these four file.In detected
result i was found boundary boxes aroud the object.

→ All these four file which generated was exported for deployment with deepstream.The deployment made with the example
found at “/root/deepstream_sdk_v4.0.1_x86_64/sources/apps/sample_apps/deepstream-test1”.

→ I had also made some small changes in the file.

→ I was tried to test these result with the help of sample images.

when used with deepstream i was found no bounding boxes appears or boxes in wrong place in INT8 mode( with network-mode=1) and FP32 mode( with network-mode=0) respectively.

Frameworks used for Training : : v1.0_py2

Frameworks used for Deployment: : 4.0.1-19.09-devel

Hi johncarry,
Could you please paste your config file for running deepstream-test1?
Also, what are the small changes against deepstream-test1?

please find attachment for config file:
[url]The modifications can be found at[/url]

Hi johncarry,
Can you attach your training spec file too? I want to check your output_image_width and output_image_height, etc.

Hi johncarry,
Please check if “input-dims” matches the height/width of the model input.
Your dstest1.pgie_config.txt shows “input-dims=3;1280;720;0”.
Does it match output_image_width/height of training spec?

And also could you set net-scale-factor to 0.0039215697906911373 as tlt doc mentioned?

→ if i set net-scale-factor set to:0.0039215697906911373 and height and width( input-dims=3;1284;384;0) as you said then output is(attachment-1):

→ Then i change net-scale-factor set to 1( input-dims=3;1284;384;0)(attachment-2):

–>Then tlt training config-file((attachment-3):

tlt-train-config.txt (5.26 KB)

Hi johncarry,
Can you set input-dims=3;1248;384;0 ? Not 1284.

BTW, what dataset did you use for training?

Please check “Model Requirements” at Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation ?

For DetectNet_v2,the tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

As was given in TLT-DetectNet-Example, we used dataset given here: [url]The KITTI Vision Benchmark Suite. No resize was done however, as there was no such step mentioned in the example notebook. Have we to resize the images before training ?

Yes, if your dataset’s resolution is not almost as your spec(1248,384),it is necessary to resize offline to match your final training size.
Really sorry for missing guidance in jupyter notebook. This guidance is only available at tlt doc.
I will ask internal team to highlight it in notebook.

Hi johncarry,
I solve your problem now. Please modify below parameters in your dstest1_pgie_config.txt.

  1. set net-scale-factor=0.0039215697906911373
  2. set input-dims=3;384;1248;0

Then make clean and make the deepstream-test1-app.

Thanks Morganh for the reply.

So I am taking these two points from the discussion:

  1. Images needs to be resized before training on TLT to match spec.
  2. Input dimension format is: NCHW.

One quick question: in deepstream, do we need to resize the video as well to match the input-dims, or is it handled internally by deepstream, based on input-dims config ?


  1. See [url]Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation,
    for DetectNet_v2 and SSD, it is necessary to resize images offline to match what you set in your training spec.

  2. You can see the comment inside dstest1_pgie_config.txt as below. So, for your case, it should be 3;384;1248;0
    where c = number of channels, h = height of the model input, w = width of model input, 0: implies CHW format

  3. No need to resize the video. The input-dims is only related to the model.

BTW, in detectnet_v2 notebook, the dataset is KITTI by default.
You need not to resize since the kitti dataset resolution almost matches the default setting of training spec.