How to fine tune deepstream resnet10 model

I’m building a deepstream application using the provided 4 class resnet10 detector.
I notice it get a lot of false detections in low light situations like around when the sun in getting close to setting for example. Its still light but not bright and sunny. Once it gets dark and the IR turns on from my cameras it works well again.

These false detections are with a very high confidence. like 0.97 for a person - yet its just a tree. Or a table which it keeps thinking is a bicycle. Because of these high confidence values I cannot filter them out.

So I was thinking I could fine-tune the model with TLT and provide it with more images taken in low-light situations.

How do I go about this- where do I find the trained reset10 models used by deepstream? Would I be better trying a resnet18 or a detectnet and if so are these already pre-trained or do I have to run training from scratch? How many additional images would I need for fine-tuning - any best practices?

I’ve read through the documentation but there seems to be no tutorials or help that cover these type of questions? If there is, please point me in the right direction… The only tutorial I can find is this one (How to Build and Deploy Accurate Deep Learning Models for Intelligent Image and Video Analytics | by NVIDIA AI | DataSeries | Medium), but its for training a resnet18 from scratch, not fine tuning an existing deepstream model.

–jason

Hi jason,
The existing deepstream model cannot work as a pretrained model in TLT. TLT provides pre-trained models in ngc.nvidia.com. But please note that they are different from existing DS model.
Currently, only the ngc models are compatible with TLT. After setting it as pre-trained model in TLT, then users can run training with their own dataset.

ok thanks @Morganh for that info… So what would you suggest I do - is there a specific model that is close to the deepstream one and trained on the same dataset in ngc?

Sorry, the tlt pre-trained models in ngc are completely different. So they are not close to deepstream one.

Hmmm… So last question - are you able to tell us what dataset the deepstream models were trained on… Then at least I could do the same?

The dataset are from Nvidia internal.
For your case, you can collect your own data as much as possbile, especially for your specific scenerio, then train it with TLT.

Thats what I’m thinking… If there os a good detector for persons and cars in NGC I’ll just collect my own “low light” images and fine tune on those…
I’m not sure that will remove false detections though - maybe the ngc models are much better.

I imagine the deepstream ones have been stripped down to their bare bones to try and get 30fps performance, etc.

To reach the mAP and fps, lots experiments are needed.
TLT is designed to enable NVIDIA customers to fine-tune pre-trained models with their own data.

User can have a look about PeopleNet and use it to re-train.
https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet