YOLOv4 vs. Detectnet_v2 Performance

I am working to create a model for traffic analytics that involves 8 classes with quite a bit of overlap in the class structure. Of the 8 classes, 5 are car like vehicles (truck, van, car, etc) and the other 3, pedestrians, bikes, and motorcycles. I have trained a couple iterations of both models types and the YOLO models absolutely blows the detectnet_v2 models out of the water in terms of both mAP and deepstream performance. I am looking for direction in closing the gap between the two because detectnet is far easier for deepstream integration and deployment on hardware.

Hardware - 2x 2080ti for training and dGPU deepstream inference.
Hardware Deployment - Jetson TX2/Xavier
TLT Version:
dockers:
nvcr.io/nvidia/tlt-streamanalytics:
docker_tag: v3.0-dp-py3
Training Spec Files:
train_YOLO.txt (2.4 KB)
train_detectnet.txt (10.0 KB)
YOLO mAP - ~75%
Detectnet_v2 mAP - ~35%

Moving Forward - I plan on trying to migrate as many of the settings used in the YOLO model to the detectnet training config to produce as close to an apples:apples comparison as possible. Beyond that, I am open to any hints/tips/tricks available to bring the detectnet model up to speed on such a dataset.

I find that you are training with different input size.
In yolo_v4, it is 1248x384
In detectnet_v2, it is 960x544

For detectnet_v2, since you are running with 3.0-dp-py3 version, it is needed to resize images/labels offline. Did you resize images/labels to 960x544?
Suggest you update tlt version to 3.0-py3 version. See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation , it is NOT needed for you to resize offline.

More, how about training yolo_v4 with the same input-size 960x544?

Thanks for the update. I have currently started detectnet to train at the same input size as YOLO along with learning rates, regularizer settings etc. I did not resize offline.
Are there any release notes for the container version changes? Should I be aware of any potential breaking changes between the two images?

For 3.0-py3 version, as mentioned, detectnet_v2 's dataloader does support resizing images to the input resolution defined in the specification file.

The train tool does not support training on images of multiple resolutions. However, the dataloader does support resizing images to the input resolution defined in the specification file. This can be enabled by setting the enable_auto_resize parameter to true in the augmentation_config module of the spec file.

BTW, we cannot draw a conclusion that yolo_v4 and detectnet_v2 will get the same performance even we train with the same lr, bs, etc. They are two different networks. Both of them have pros and cons.

Sure, I understand they will both be different. The initial models were just VERY different, so I wanted to close that gap a little bit. I will update to the new container version and flag the augmenter to resize the input images before training and try again.

Hi Morganh,

This last run seems even worse than past attempts. The mAP value was <1.0 with the AP value of car oscillating up and down throughout training and the model almost never breaking 1.0 for other classes. Can you confirm that my setup is correct for having already split my train/eval data? This result is with ‘enable_auto_resize=True’ and slower learning rates similar to YOLO.

Can you share both training logs and training spec files? Thanks.

I ended up getting some pretty good results after tinkering with more settings. For those looking to increase performance I had some success by freezing blocks 0 and 1, using the updated TLT 3.0 container that supports resizing and setting that flag, as well as changing “all_projections”

1 Like

Thanks for the info.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.