Odd bbox using TLT custom model in Deepstream

Please provide complete information as applicable to your setup.
• Hardware Platform GPU
• DeepStream Version 5.0-ga

Hi,
I trained a model with TLT to detect cars, bikes, license plates, and a dontcare class. The model works great when testing using TLT inference, but when deploying it to deepstream it outputs some giants bbox of class ‘car’. Tests were made with FP32 and INT8 engines with tracker disabled.

Here is a folder with TLT and deepstream outputs, and configurations files. https://drive.google.com/drive/folders/1v3hn6A8Oy2okZSM08LbjvOD3L2MfMPrN?usp=sharing

Someone got an idea of what could be happening?

2 Likes

i have the same problem, stay tuned

1 Like

Moving this topic into TLT forum.

Could you please share below files too?

  1. the training spec of TLT
  2. the label file when deploy in DS

More, what is the CarlosV2/config_infer.txt?

Files added to drive (spec in TLT folder and labels in deepstream folder)

That’s a folder where I put the config file, I just forgot to change the path. It should be config_infer_primary.txt (already fixed in source1_ds.txt)

Could you please try to run with fp16 mode to see if it can reproduce?

More, you can also try to narrow down with
https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps.

Generate a test h264 file, then run below.
$ ./deepstream-custom -c config_infer_primary.txt -i test.h264 -d

Couldn’t run FP16 (hardware problem).
Here is the output

WARNING: …/nvdsinfer/nvdsinfer_model_builder.cpp:759 FP16 not supported by platform. Using FP32 mode.
WARNING: …/nvdsinfer/nvdsinfer_model_builder.cpp:1291 FP16 not supported by platform. Using FP32 mode.

Following instructions in https://github.com/NVIDIA-AI-IOT/deepstream_tlt_apps I get the same output as with deepstream-app.

I still don’t know if it’s a training or deploy problem. Today I did the training again and tested the model with a checkpoint (epoch 70 of 120) using deepstream and work great. Nevertheless, the issue persists when testing with the finished model. However using !tlt-infer detectnet_v2 with both, checkpoint and finished model all inferences are perfect.

If your platform does not support fp16, it is fine to narrow down the issue with fp32 mode.

If 70th epoch runs perfect, I am afraid there is not deploy problem.
How about the mAP of 70th epoch and 120th epoch?

Problem solved! some parts of the dataset had wrong labels.

I divided the complete dataset and train with the different small-dataset. One of those small-dataset had bad annotation labels.