Getting erroneous detection with TLT trained model deployed while testing with deepstream

Hello,

Have been trying end-to-end training and deployment of model using TLT and deepstream, but now I feel stuck. Looking for some help. Thanks.

Problem:
Have tried testing end-to-end training and deployment, but unable to get proper boundary boxes around objects.

Frameworks used:

  • Training: nvcr.io/nvidia/tlt-streamanalytics : v1.0_py2
  • Deployment: nvcr.io/nvidia/deepstream : 4.0.1-19.09-devel

Steps followed:

  • Followed the example provided by TLT container at /workspace/examples/detectnet_v2, without any changes to script (attaching notebook pdf)
  • Used 8 GPUs for training, using the script embedded in example for multi-gpu training.
  • Detection results were good and objects were being bounded properly by box (attaching results).
  • Following files were generated successfully "calibration.bin", "calibration.tensor", "resnet18_detector.etlt", "resnet18_detector.trt". (Attaching etlt model for reference).
  • These files were exported to deepstream, to be used with slight modification of example found at "/root/deepstream_sdk_v4.0.1_x86_64/sources/apps/sample_apps/deepstream-test1". (The modifications can be found at https://gist.github.com/AlexTech009/2b22805e6cfbb7ecf0c86ca004fcc6b3).
  • The sample video at "/root/deepstream_sdk_v4.0.1_x86_64/samples/streams/sample_720p.h264" was used for inference testing.

Expected results:

  • Since no modifications were made to training script and nor much to deepstream, we were expecting it to detect cars and pedestrians(person) which shows up in video, however it makes erroneous bounding boxes, if any.

Actual results:

  • In INT8 mode, with network-mode=1, no bounding boxes appears.
  • In FP32 mode, with network-mode=0, boxes appears but at erroneous places. (Attaching results)

detectnet_v2_notebook.pdf (14.9 MB)
resnet18_detector.etlt.zip (42.8 MB)
TLT_Detection_Results.jpg
Deepstream_Detection_Result.jpg

TLT_Detection_Results.jpg

Deepstream_Detection_Result.jpg

Hi,

Thanks for your update.
We are checking this issue and will update more information with you later.

Thanks.

Hello AlexTech009,

Thanks for using TLT and DeepStream!

From you result images, the TLT detection works fine, but DeepStream detection produces wrong results. So this should be an issue in the DeepStream deployment step.

If you train at input size 1248x384 and then deploy with DS at input size input-dims=3;1280;720;0,
the video will be scaled heavily and objects in the video can be very large. The KITTI dataset objects is not too large, and the model trained on that dataset might have difficulties to detect large objects in the scaled video. This can be the reason. Can you try a smaller input-dims in DS?

Thanks.

Please also check detectNet output parser. You can also make use of resnet models instead

Please Refer this GitHub for some information.
It shows how to enable a customized bbox parser for DetectNet.
https://github.com/AastaNV/DeepStream

Please refer to the same issue from TLT forum.
https://devtalk.nvidia.com/default/topic/1064422/transfer-learning-toolkit/finding-inaccurate-result-while-testing-model-tlt-trained-model-with-deepstream-/