Transfer Learning using freeze_block property of Yolov4 config using Resnet 18 Arch

TLT Version → docker_tag: v3.21.08-py3
Network Type → Yolov4

Hi,

I am just trying to understand the concept of freeze blocks property for resnet 18 architecture. As I trained my custom dataset till 100 epochs and got map around 84% without using freeze_blocks property. Then I tried to analyse map variation by training using different different freeze blocks 0,1,2 and 3 and I got map for these freeze blocks 62%,86%,70% and 90% respectively. In the documentation it is written that ( A general principle to keep in mind is that the smaller the block ID, the closer it is to the model input; the larger the block ID, the closer it is to the model output).

So, my question is how many layers it is freezing for a specific freeze_block id and as I used diff diff freeze_block id and getting diff diff map so what value will be good in order to get the better accuracy and in order to save computaional time. Without freeze_block i am getting map around 84% then if am using freeze_block as 0 it decreasing to 62% which is not good. the what is the benefit to use this.

As, I am a begginer in this field I just want to know how this network is behaving while I am training with diff diff freeze_blocks. Can u please explain me this transfer learning concept in detail so that I can understand better.

Hi,

I am waiting for the response from your side.

Weights of layers in those blocks will be freezed during training.
As mentioned in YOLOv4 — TAO Toolkit 3.21.11 documentation
The list of block IDs to be frozen in the model during training. You can choose to freeze some of the CNN blocks in the model to make the training more stable and/or easier to converge. The definition of a block is heuristic for a specific architecture (for example, by stride or by logical blocks in the model). However, the block ID numbers identify the blocks in the model in a sequential order so you don’t have to know the exact locations of the blocks when you do training. A general principle to keep in mind is that the smaller the block ID, the closer it is to the model input; the larger the block ID, the closer it is to the model output.

can u elaborate more on this according to what exactly I wrote other than that is written in the documentation. Please read my full query so that u can understand what exactly I am trying to analyse.

As mentioned above, the definition of a block is heuristic for a specific architecture (for example, by stride or by logical blocks in the model). However, the block ID numbers identify the blocks in the model in a sequential order so you don’t have to know the exact locations of the blocks when you do training.

How many training images in your custom dataset? Can this be reproduced every time? Suggest you trying to use public dataset(KITTI datset) to train as well.

Weights of layers in those blocks will be freezed during training. The training will be more stable and/or easier to converge.