Hello everyone.
Applying the Knowledge distillation technique, I created a student model that I use as a base_net in an SSD network for object detection activities. Wanting to preserve the accuracy of the student model, I noticed that in the script it is possible to freeze the layers during training.
My idea is to train the remaining convolutional layers (added by the SSD architecture) leaving the base_net model unchanged.
The question is: will the whole model be able to learn the boxes present in the loader during training?
Which training are you referring to? To that of the entire model or only to the one reserved for the convolutional layers added by the SSD architecture?