Yolo_v4_tiny configs (anchor shapes)

• Network Type (Tiny_Yolo_v4)
• TLT Version (nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3)


I want to train an object detector with my custom dataset, I’m following the instructions provided in the jupyter notebook, but the anchor shapes part is a little confusing.

It says:

If you use your own dataset, you will need to run the code below to generate the best anchor shape
The anchor shape generated by this script is sorted.

Write the first 3 into small_anchor_shape in the config file.
Write middle 3 into mid_anchor_shape.
Write last 3 into big_anchor_shape.

!tao yolo_v4_tiny kmeans -l $DATA_DOWNLOAD_DIR/training/labels
-i $DATA_DOWNLOAD_DIR/training/images
-n 9
-x 1248
-y 384

  • half of my dataset is 512*384 and the other half is 384*512, what values for x, y I should use?
  • the above code produces 3 clusters (small_anchor_shape, mid_anchor_shape, big_anchor_shape) but the config file only has two clusters (mid_anchors_shape big_anchor_shape), how should I use them?!
  • in the augmentation_config part in the config file there are output_width: 1248 and output_height: 384, are they related to x, y in the above command or image sizes in my dataset?

I tried some combinations randomly, training converges well but the predicted boxes are not very good, the width of predicted boxes are larger than the expected

The output_width and output_height depend on your target. If you want to train a 512*384 model, please set output_width:512 and output_height:384.

For yolo_v4_tiny network, only big_anchor_shape and mid_anchor_shape are needed to set.

For your case, please try to resize the images to the same resolution , and also resize the labels accordingly. Then run kmeans against the new images and new labels.

1 Like

Thanks for your response, I have another question about the anchor shapes.

These are the produced anchor shapes for my dataset:
(16.00, 19.00) (1)
(33.00, 39.00) (2)
(61.00, 89.00) (3)

(147.50, 44.00) (4)
(173.00, 65.00) (5)
(167.00, 92.00) (6)

(95.00, 171.00) (7)
(202.00, 124.50) (8)
(251.00, 184.00) (9)

What is the correct order?

[(small),(mid),(big)] (summation of the booleans)

big_anchor_shape: “[7, 8, 9]”

mid_anchor_shape: “[6, 5, 4]”

small_anchor_shape: “[3, 2, 1]”


big_anchor_shape: “[8, 7, 9]”

mid_anchor_shape: “[5, 4, 6]”

small_anchor_shape: “[2, 1, 3]”


big_anchor_shape: “[9, 8, 7]”

mid_anchor_shape: “[6, 5, 4]”

small_anchor_shape: “[3, 2, 1]”

Please refer to https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html#creating-a-configuration-file

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.