Applying the Knowledge distillation technique, I created a student model that I use as a base_net in an SSD network for object detection activities. Wanting to preserve the accuracy of the student model, I noticed that in the script it is possible to freeze the layers during training.
My idea is to train the remaining convolutional layers (added by the SSD architecture) leaving the base_net model unchanged.
The question is: will the whole model be able to learn the boxes present in the loader during training?
If the base model is used as a feature extractor, the training should work.
Actually, we don’t have too much experience in knowledge distillation.
Maybe other users who have tried something similar can comment more on this.
Which training are you referring to? To that of the entire model or only to the one reserved for the convolutional layers added by the SSD architecture?
It is possible to fix the base_net weight to be unchanged.
But it’s recommended to find some relevant papers for more information first.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.