Getting erroneous detection in TLT detection example.

Getting false detection after running sample detection script. Lot many bounding boxes are being generated even for the objects that aren’t present in the image.

Process adopted
Tried example given in TLT docker /examples folder - that related to detection. Executed every step without any error in the jupyter notebook provided.

Expected result
Tried “tlt-infer detection” on test image - was expecting detection of a person who was present in image.

Actual results

Got an output image with lots of unnecessary bounding boxes. Actual person was not detected.

Looking for assistance with TLT. No changes were made to the default example script still getting erroneous results. Attaching a print of jupyter script results.

detection-report.pdf (100 KB)
detection_notebook.pdf (5.88 MB)

If someone has been able to use TLT successfully can they please post the steps. There is little help available elsewhere and the community lacks proper moderators or they might be busy with other stuff.

Most of the posts on this TLT forum have lots of views but no replies.

I have tried TLT at least 10-15 times but everytime false results and that too a lot of detections. Any help would be appreciated.

In general, a similar situation, if follow all the steps, the result is terrible. However, if I exclude the “prune” and “retrain” steps and immediately execute it on a non-optimized network, the result is good.

9 step:

# Running inference for detection on n images
!tlt-infer detection -i $USER_EXPERIMENT_DIR/data/VOCdevkit/VOC2012/JPEGImages_kitti/test \
                     -o $USER_EXPERIMENT_DIR/tlt_infer_testing \
                     -ek $API_KEY \
                     -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt \
                     -cp $SPECS_DIR/det_clusterfile_pascal_voc.json \
                     -k -bo -lw 3 \
                     -g 0 \
                     -bs 64

I get enough good result. For experiment, used NVidia GP106-100.

Also i try to train face detection network on part of WilderFace dataset, and get, not bad result without prune step, and a lot of wrong bounding boxes after prune and retrain step.

If someone, solve this problem, tell me guide!

Hi gseepaksh,
Sorry for late reply. I am the moderator of TLT forum in future. From your attached pdf, seems that the accuracy is not optimized.I also find below comment in the first page of detection_notebook.pdf.
"This notebook shows an example usecase of Object Detection using Transfer Learning Toolkit. It is not
optimized for accuracy

BTW, next release of TLT will be available in two days. You may have a trial again.
I also check your steps to see if there is the same issue.

Hi gseepaksh,
After checking your attached notebook log, in page 15, you set prune threshold too big.
-pth 0.94

This yields prune too much, as page 17, only 0.46% param are kept.

Please set the threshold to 0.1. We set it as default.

Is the next release available? Its been like 15 days since that post.

Sorry for my commitment on the release time.Autually the GA release will be available soon but I’m not really sure the exact time as of now.
For your issue, please set pth value lower. The value 0.9 will yield prune too much as I commented above.

Thanks. I will give it a shot. Do you have some example for LSTM as well?