I have trained a model using DetectNetV_2 in Nvidia tlt. When i run the inference on the test images, for 2 person standing next to each other, it draws a single bounding box for both person at the centre. What is the possible solution for this??
It depends on more training dataset which contains similar scenario, i.e, two persons standing next to each other.
Adding these images and labels, then trigger training.
More reference: Detecting stacking of products
will changing dbscan parameters in train spec file help resolve this upto some extent??
If you want to finetune the dbscan parameter, please run tlt-infer directly, set different dbscan in the tlt-infer spec file.
For example, refer to the command and spec file in my comment of the Retraining peoplenet model with own images
Then, try to set a lower dbscan_eps, run tlt-infer and check the inference result.
dbscan_eps: (float) The search distance to group together boxes into a single cluster. The lesser the number, the more boxes are detected.
More, you can also try a bigger backbone under detectnet_v2 network.