Reproducing YoloV4 COCO mAP

vcmike · February 9, 2022, 3:32am

Hardware Platform (Jetson / GPU)
GPU

DeepStream Version
nvcr.io/nvidia/deepstream:6.0-devel

NVIDIA GPU Driver Version (valid for GPU only)
NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5

Issue Type( questions, new requirements, bugs)
I have been trying to replicate the Darknet YoloV4 results for the COCO dataset as I really like the TAO workflow but have been unable to match Darknet in terms of accuracy (mAP) as I am consistently lower.

Given the resources at your disposal, are you able to produce a training spec (with bonus points for an official NGC model AI Models - Computer Vision, Conversational AI, and More | NVIDIA NGC) that produces per class accuracy similar to these which were calculated by running the Darknet official yolov4.weights and yolov4.cfg against the COCO2017 Validation set (5000 images)? I am sure this would be very helpful as a starting point for training custom YoloV4 models.

class_id = 0, name = person, ap = 79.29%   	 (TP = 7956, FP = 3157) 
class_id = 1, name = bicycle, ap = 60.28%   	 (TP = 173, FP = 94) 
class_id = 2, name = car, ap = 68.93%   	 (TP = 1290, FP = 703) 
class_id = 3, name = motorcycle, ap = 74.49%   	 (TP = 266, FP = 134) 
class_id = 4, name = airplane, ap = 90.64%   	 (TP = 124, FP = 26) 
class_id = 5, name = bus, ap = 84.81%   	 (TP = 221, FP = 56) 
class_id = 6, name = train, ap = 93.01%   	 (TP = 168, FP = 37) 
class_id = 7, name = truck, ap = 61.90%   	 (TP = 253, FP = 214) 
class_id = 8, name = boat, ap = 54.23%   	 (TP = 223, FP = 132) 
class_id = 9, name = traffic light, ap = 55.11%   	 (TP = 371, FP = 216) 
class_id = 10, name = fire hydrant, ap = 89.23%   	 (TP = 86, FP = 11) 
class_id = 11, name = stop sign, ap = 77.69%   	 (TP = 56, FP = 19) 
class_id = 12, name = parking meter, ap = 68.42%   	 (TP = 38, FP = 14) 
class_id = 13, name = bench, ap = 43.16%   	 (TP = 178, FP = 195) 
class_id = 14, name = bird, ap = 53.50%   	 (TP = 223, FP = 102) 
class_id = 15, name = cat, ap = 90.56%   	 (TP = 167, FP = 52) 
class_id = 16, name = dog, ap = 82.53%   	 (TP = 178, FP = 65) 
class_id = 17, name = horse, ap = 85.73%   	 (TP = 226, FP = 70) 
class_id = 18, name = sheep, ap = 78.52%   	 (TP = 287, FP = 136) 
class_id = 19, name = cow, ap = 80.75%   	 (TP = 287, FP = 93) 
class_id = 20, name = elephant, ap = 87.41%   	 (TP = 228, FP = 64) 
class_id = 21, name = bear, ap = 92.45%   	 (TP = 62, FP = 5) 
class_id = 22, name = zebra, ap = 91.89%   	 (TP = 226, FP = 41) 
class_id = 23, name = giraffe, ap = 93.04%   	 (TP = 206, FP = 33) 
class_id = 24, name = backpack, ap = 33.61%   	 (TP = 132, FP = 189) 
class_id = 25, name = umbrella, ap = 69.18%   	 (TP = 283, FP = 163) 
class_id = 26, name = handbag, ap = 33.49%   	 (TP = 196, FP = 262) 
class_id = 27, name = tie, ap = 57.87%   	 (TP = 140, FP = 74) 
class_id = 28, name = suitcase, ap = 71.06%   	 (TP = 201, FP = 112) 
class_id = 29, name = frisbee, ap = 88.07%   	 (TP = 99, FP = 34) 
class_id = 30, name = skis, ap = 51.67%   	 (TP = 118, FP = 73) 
class_id = 31, name = snowboard, ap = 56.31%   	 (TP = 39, FP = 23) 
class_id = 32, name = sports ball, ap = 62.70%   	 (TP = 168, FP = 87) 
class_id = 33, name = kite, ap = 67.57%   	 (TP = 218, FP = 135) 
class_id = 34, name = baseball bat, ap = 61.75%   	 (TP = 83, FP = 36) 
class_id = 35, name = baseball glove, ap = 65.70%   	 (TP = 95, FP = 44) 
class_id = 36, name = skateboard, ap = 79.67%   	 (TP = 142, FP = 33) 
class_id = 37, name = surfboard, ap = 63.34%   	 (TP = 163, FP = 83) 
class_id = 38, name = tennis racket, ap = 85.23%   	 (TP = 188, FP = 64) 
class_id = 39, name = bottle, ap = 58.25%   	 (TP = 583, FP = 424) 
class_id = 40, name = wine glass, ap = 58.73%   	 (TP = 180, FP = 112) 
class_id = 41, name = cup, ap = 64.70%   	 (TP = 567, FP = 425) 
class_id = 42, name = fork, ap = 59.49%   	 (TP = 117, FP = 93) 
class_id = 43, name = knife, ap = 35.51%   	 (TP = 107, FP = 113) 
class_id = 44, name = spoon, ap = 36.94%   	 (TP = 89, FP = 139) 
class_id = 45, name = bowl, ap = 61.86%   	 (TP = 382, FP = 320) 
class_id = 46, name = banana, ap = 43.44%   	 (TP = 152, FP = 144) 
class_id = 47, name = apple, ap = 29.17%   	 (TP = 83, FP = 115) 
class_id = 48, name = sandwich, ap = 57.77%   	 (TP = 97, FP = 84) 
class_id = 49, name = orange, ap = 40.90%   	 (TP = 139, FP = 173) 
class_id = 50, name = broccoli, ap = 45.10%   	 (TP = 139, FP = 156) 
class_id = 51, name = carrot, ap = 35.05%   	 (TP = 162, FP = 275) 
class_id = 52, name = hot dog, ap = 54.20%   	 (TP = 60, FP = 36) 
class_id = 53, name = pizza, ap = 73.71%   	 (TP = 207, FP = 95) 
class_id = 54, name = donut, ap = 62.85%   	 (TP = 222, FP = 154) 
class_id = 55, name = cake, ap = 62.36%   	 (TP = 188, FP = 126) 
class_id = 56, name = chair, ap = 56.48%   	 (TP = 998, FP = 835) 
class_id = 57, name = couch, ap = 65.76%   	 (TP = 165, FP = 125) 
class_id = 58, name = potted plant, ap = 52.67%   	 (TP = 192, FP = 198) 
class_id = 59, name = bed, ap = 72.57%   	 (TP = 113, FP = 52) 
class_id = 60, name = dining table, ap = 47.17%   	 (TP = 368, FP = 401) 
class_id = 61, name = toilet, ap = 85.77%   	 (TP = 150, FP = 42) 
class_id = 62, name = tv, ap = 83.08%   	 (TP = 230, FP = 82) 
class_id = 63, name = laptop, ap = 80.98%   	 (TP = 180, FP = 74) 
class_id = 64, name = mouse, ap = 82.85%   	 (TP = 85, FP = 34) 
class_id = 65, name = remote, ap = 60.85%   	 (TP = 166, FP = 115) 
class_id = 66, name = keyboard, ap = 76.71%   	 (TP = 115, FP = 70) 
class_id = 67, name = cell phone, ap = 62.18%   	 (TP = 165, FP = 97) 
class_id = 68, name = microwave, ap = 77.63%   	 (TP = 44, FP = 22) 
class_id = 69, name = oven, ap = 65.43%   	 (TP = 90, FP = 65) 
class_id = 70, name = toaster, ap = 60.70%   	 (TP = 5, FP = 5) 
class_id = 71, name = sink, ap = 65.99%   	 (TP = 148, FP = 80) 
class_id = 72, name = refrigerator, ap = 81.52%   	 (TP = 100, FP = 47) 
class_id = 73, name = book, ap = 26.10%   	 (TP = 298, FP = 378) 
class_id = 74, name = clock, ap = 73.27%   	 (TP = 200, FP = 77) 
class_id = 75, name = vase, ap = 58.27%   	 (TP = 175, FP = 153) 
class_id = 76, name = scissors, ap = 51.90%   	 (TP = 17, FP = 8) 
class_id = 77, name = teddy bear, ap = 71.03%   	 (TP = 134, FP = 63) 
class_id = 78, name = hair drier, ap = 7.12%   	 (TP = 1, FP = 3) 
class_id = 79, name = toothbrush, ap = 40.25%   	 (TP = 27, FP = 28)

 for conf_thresh = 0.25, precision = 0.69, recall = 0.65, F1-score = 0.67 
 for conf_thresh = 0.25, TP = 24077, FP = 10831, FN = 12704, average IoU = 56.97 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.703672, or 70.37 %

Morganh · February 9, 2022, 6:16am

vcmike:

 for conf_thresh = 0.25, precision = 0.69, recall = 0.65, F1-score = 0.67 
 for conf_thresh = 0.25, TP = 24077, FP = 10831, FN = 12704, average IoU = 56.97 % 

 IoU threshold = 50 %, used Area-Under-Curve for each unique Recall 
 mean average precision (mAP@0.50) = 0.703672, or 70.37 %

May I know how did you train and get above result?

vcmike · February 9, 2022, 6:28am

Hi.
This was produced by running the command below as per docs on the Darknet repository: GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

The yolov4.cfg is the official configuration file from here: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg

The yolov4.weights file is the official weights file for YoloV4 from here: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

The .cfg file plus .weights are the official files used to produce the current #14 place on the COCO Benchmark: COCO Benchmark (Real-Time Object Detection) | Papers With Code

The images for running this validation are the 5000 validation images available from the Coco Downloads page: http://images.cocodataset.org/zips/val2017.zip

darknet detector map ./obj.data yolov4.cfg yolov4.weights

I am hopeful that you are able to reproduce the same results with TAO.

Morganh · February 10, 2022, 1:56am

Thanks for the info. I will check further.
More, for yolov3 SOTA experiment, please refer to

Learn how to Prepare state of the art models for classification and object detection with TAO

vcmike · February 11, 2022, 12:35am

Thank you @Morganh. I have taken the ‘SOTA’ Yolov3 specs and adapted for Yolov4 (a subset of classes). I will report back once training is done with a comparison to Darknet.

Morganh · February 11, 2022, 3:22am

Please note that for tao 21.11 version, in yolov4,
loss_loc_weight: 1.0
loss_neg_obj_weights: 1.0
loss_class_weights: 1.0

More setting can be found in https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#creating-a-configuration-file

Morganh · February 11, 2022, 6:45am

More, before training with YOLOv4, it is necessary to train and get a good pretrained model with Imagenet dataset using classification network. See Prepare state of the art models for classification and object detection with TAO.

vcmike · February 12, 2022, 4:03am

Thanks @Morganh . Do the pretrained models from nvidia such as nvidia/tao/pretrained_object_detection:cspdarknet53 meet your definition of a ‘good pretrained model’?

Morganh · February 12, 2022, 5:12am

It is not. That model is training against OpenImage dataset.
Please follow https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tao-toolkit/ to train a classification model against Imagenet dataset.Due to copyright issues, we can’t provide the ImageNet dataset or any ImageNet-pretrained models in TAO Toolkit.

Morganh · February 12, 2022, 1:16pm

To train a classification model with cspdarkent53 model, please modify
arch: “darknet”
to
arch: “cspdarknet”

in deepstream_tao_apps/darknet53.txt at release/tao3.0 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub

vcmike · February 13, 2022, 11:19pm

Thank you @Morganh. I have this training and it should complete in the next week or two :D

Please leave this topic open so I can update once this step is complete.

Morganh · February 25, 2022, 4:54am

More, please note that in official github, we should use coco14 instead of coco2017.
Official darknet PTM is trained on coco14.

Below should be the training images and val images.

trainval2014 :117264 images
val5k: 4954 images

See the details in Inconsistent splits between COCO 2014 and COCO 2017? · Issue #5751 · AlexeyAB/darknet · GitHub
and https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tao-toolkit/

Download the COCO 2014 dataset from the COCO website. To compare with the SOTA model, do the training/testing split the same way as the original author. Also, the author’s training/validation split is different from the COCO 2014 official training/validation split and can be reproduced by the get_coco_dataset.sh bash file.

Using the bash file, get 5k.txt and no5k.txt. Those are the file names for validation and training images/labels. After preparing the data following the COCO 2014 data preparation section, merge the original training/validation set and re-split it according to those two files.

vcmike · March 2, 2022, 5:06am

Just to provide an update, I have successfully trained a cspdarknet53 backbone from scratch using ImageNet2012. After 200 epochs reached a Top 1 accuracy of 77.19% which is inline with expected results. I think there may be some gains to be had by trying activation: mish which I may do.

I will now try to train a COCO model with this backbone and the Darknet COCO 2014 split to confirm TAO produces similar results.

Morganh · March 2, 2022, 6:09am

May I know is it test accuracy?

vcmike · March 2, 2022, 6:15am

That was validation set accuracy. Training accuracy finished at 74.09%. I feel that it may improve with a few more epochs still.

Morganh · March 2, 2022, 6:20am

Share the result of two kinds of models internally.
1st: default activation, 200 epochs. training accuracy: 0.769, val_accuracy: 0.7813
2nd: activation:mish . 300 epochs. training accuracy: 0.79998, val_accuracy: 0.7883575

vcmike · March 2, 2022, 6:24am

Great this is very useful information. I did try multigpu training for part of the process (which I believe impacts training) so I will try to continue the run until we reach your results.

system · March 16, 2022, 6:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
YOLOv4 accuracy difference between TAO and Darknet TAO Toolkit	5	1620	October 12, 2021
Reproducing darknet YOLO4 results in TAO TAO Toolkit yolo	4	686	August 8, 2022
YOLOv4 Training with TAO vs Darknet: Differences in mAP Values TAO Toolkit	3	633	April 24, 2023
Yolov4 low map TAO Toolkit	2	791	December 8, 2021
Performance of TAO 3.22.05 and TAO 4.0.1 is lower than TAO 3.21.08 TAO Toolkit	9	588	June 15, 2023
TAO Toolkit with Yolov4-Tiny and custom pretrained model TAO Toolkit	30	1459	June 26, 2023
Yolo_v4 getting stuck while training OpenGL yolo , tao	0	955	October 12, 2021
Tao model benchmarking with model from other repository TAO Toolkit	2	553	April 5, 2023
Yolov4_tiny with darknet19/53 TAO Toolkit	25	1041	April 11, 2022
Unable to train yolov4 with Tao succesfully TAO Toolkit	6	583	April 28, 2023

Reproducing YoloV4 COCO mAP

Related topics