Squeezenet

Salut,
based on the article “Finetuning Torchvision Models” (Finetuning Torchvision Models — PyTorch Tutorials 1.2.0 documentation) I wrote a python script that trains all mentioned networks. For all networks like alexnet, ResNet, Densnet, Vgg and resNeXt, the training goes well with the expected reduction in the loss value and the accuracy of the network classifying the images. ROC values are good as well. Except for SqueezeNet: loss stays at a high value and after some epochs, the net classifies all images the same way. This is the same for single-label as well as multi-label.
As I use the same code to run the training and only change the model definition, I think I am doing something wrong with initializing Sqeezenet:
model = models.squeezenet1_1(pretrained=usePreTrain)
model.classifier[1] = nn.Conv2d(512, nnClassCount, kernel_size=(1,1), stride=(1,1))
set_parameter_requires_grad(model, feature_extract)
model.num_classes = nnClassCount
criterion = nn.BCELoss(size_average = True)
I uploaded the full code to github (GitHub - GastonLagaffe2013/PyTorchModels)

Any help is appreciated, Mathias