NVIDIA TrafficCamNet v1.0 performs fine in most of the cases, except when using cameras mounted with a “bird’s-eye” perspective as some vehicles are not detected.
I would like to improve the existing detector, by reusing the original weights and extend the model on TLT with additional “car” labelled images.
For that purpose I’ve used 9000 “bird’s eye” images, with about 250K labbeled cars.
Most of them are from this dataset: Parking Lot Database - Laboratório Visão Robótica e Imagem
Initial training was performed during 120 epochs with no frozen layers, but the resulting model ended overfitted to the type of images & perspective present on the new dataset.
Then in another TLT training, I tried to freeze the model layers in a hope that will preserve the original weights, but the training converges too fast and also gets overfitted just after 3 or 4 epochs.
I would like the resulting model from TLT training be capable to keep similar accuracy as the original in the “car” detection, and additionally the same “car” class be able to deal with the “bird’s-eye” perspective images.
Here is the complete training specification I’m using: train_resnet18_kitti.txt - Pastebin.com
Is there anything I can tweak on the model and training config level?
Or should the dataset that I’m using in TLT be augmented to contain images identical to the ones used in NVIDIA private dataset, in order the model does not “forget” such cases during TLT training?