Performance difference between various tlt models while deploying on deepstream.

I trained a 5 class object detector using TLT. As default deepstream-reference app uses a detectnetv2 resnet10 model, I used the same architecture for TLT training. The issue is I am not getting good results (in terms of accuracy). So I want to switch to some other network like (ssd). Is there any performace chart or plot ( in terms of speed and accuracy ) comparing all the supported TLT architectures when deployed via deepstream on nano ? Also, what kind of performance degradation should I expect when deploying my own custom models ( other than TLT ).

Hi neophyte1,
Since it is related to TLT, I move this topic from DS forum into TLT forum.

Hi neophyte1,
Different dataset have different results for the same TLT network.

For KITTI dataset,
The TLT detetnet_v2 network has a better mAP result. It is around 80% mAP.

For COCO dataset(resizing to 300x300),
As far as I known, TLT SSD network has a better mAP result.As of now, I can get around 40%mAP @IOU 0.5 for 80 classes.

And regarding to fps, each kind of TLT network needs pruning.
A better way is that try prune less -->retrain --> prune --> retrain --> prune --> retrain to find a good balance between mAP and FPS.

For your case, I need to reproduce your experiment with COCO 5 classes data and TLT detectnet_v2.
Your experiments: COCO data, 5 classes(person,bicycle,motorbike,bus,car), 512x512
Correct me if any.Thanks.

Hi Morganh,
By performance, I don’t just mean accuracy but speed also.(hence, not dependent on a particular dataset)

“And regarding to fps, each kind of TLT network needs pruning.
A better way is that try prune less -->retrain --> prune --> retrain --> prune --> retrain to find a good balance between mAP and FPS.”

But from my understanding, there is a limit to the amount of pruning one can do while maintaing a respectable accuracy when compared to an unpruned model. Now, my fear is if I take too heavy a network, even though I might get accuracy, I might not get good fps on nano even after pruning and if I do too much pruning on that network, then accuracy might suffer. Since, the deepstream reference app comes with detectnetv2 architecture(resnet 10) and it works great both in terms of accuracy and speed, I have tried till now to base my training on it( reason for my aversion towards SSD). Hence, I wanted to know about any prior experiments done with other networks on nano to get a better idea of which networks might work( both speed and accuracy ) .

“For your case, I need to reproduce your experiment with COCO 5 classes data and TLT detectnet_v2.
Your experiments: COCO data, 5 classes(person,bicycle,motorbike,bus,car), 512x512”

Yes, more or less my experimental setup is similar except there are some added custom augmentations.

As from our previous discussions, I am seeing way too much false positives majorly for person class ( of quite large sizes and high confidence ).

Thanks for your help.

I can share my experience for pruning on TLT detectnet_v2. For KITTI dataset or VOC dataset, even though I do a one-off heavy pruning, the mAP can keep similar to unpruned. You can trigger experiments to confirm it.
So, the important thing for detectnet_v2 is to generate a better mAP model.

More, I am not sure why you set COCO dataset to 512x512. Is it a requirement?
Smaller size, less grids(512/16 * 512/16) for detectnet_v2. That may hurt mAP.

Hi Morganh,
Thanks for the update. Kindly share your experience with other datasets also. It would be quite helpful and I might get some insight into the whole training process.

As mentioned before, mAP is not my primary concern. From my past experiences ( other than TLT), I have found that good mAP on a dataset doesn’t necessarily translate to good real life detections. I am using COCO as it’s a difficult dataset to train on and even though the mAP generally comes as lower than other datasets, the real world inferences tend to be good.

No, 512x512 is not a requirement. I chose this particular size for the same reason as to why I am not going for other bigger architectures. It’s large enough to get good mAP at the same time it’s small enough that when I deploy on nano the performance will not get hurt by much.

I might go for other input sizes if it gives a good accuracy vs speed tradeoff.

Another thing which I would like to know, does tlt generate negative samples internally while training ?

No,tlt does not generate negative samples internally while training.