Hi Morganh,
I am training classification model using TLT-v2 my classes are Mask and No_Mask and the backbone is Resnet-18. I have Image count is 1700 for mask and same for No_Mask. The minimum size of image is 46x55 and maximum is 496x560 and the Average is around 96x96. My detection model is PeopleNet which I used for Person and Face detection and on Face I do Mask No_Mask classification.
The question is that I am getting consistently poor result on Mask No_mask classification. My config param are below :
**model_config {
arch: “resnet”,
n_layers: 18
Setting these parameters to true to match the template downloaded from NGC.
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,96,96”
}
train_config {
train_dataset_path: “/workspace/tlt-experiments/data/split/train”
val_dataset_path: “/workspace/tlt-experiments/data/split/val”
pretrained_model_path: “/workspace/tlt-experiments/classification/pretrained_resnet18/tlt_pretrained_classification_vresnet18/resnet_18.hdf5”
optimizer: “sgd”
batch_size_per_gpu: 64
n_epochs: 1000
n_workers: 16
regularizer
reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}
learning_rate
lr_config {
scheduler: “soft_anneal”
learning_rate: 0.006
soft_start: 0.056
annealing_points: “0.3, 0.6, 0.8”
annealing_divider: 10
}
}
eval_config {
eval_dataset_path: “/workspace/tlt-experiments/data/split/test”
model_path: “/workspace/tlt-experiments/classification/output/weights/resnet_214.tlt”
top_k: 3
batch_size: 256
n_workers: 8
}**
and also I have tried with the Adam optimizer as well another config is below :
**model_config {
arch: “resnet”,
n_layers: 18
Setting these parameters to true to match the template downloaded from NGC.
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,96,96”
}
train_config {
train_dataset_path: “/workspace/tlt-experiments/data/split/train”
val_dataset_path: “/workspace/tlt-experiments/data/split/val”
pretrained_model_path: “/workspace/tlt-experiments/classification/pretrained_resnet18/tlt_pretrained_classification_vresnet18/resnet_18.hdf5”
optimizer: “adam”
batch_size_per_gpu: 32
n_epochs: 1000
n_workers: 16
regularizer
reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}
learning_rate
lr_config {
scheduler: “step”
learning_rate: 0.006
#soft_start: 0.056
#annealing_points: “0.3, 0.6, 0.8”
#annealing_divider: 10
step_size: 10
gamma: 0.1
}
}
eval_config {
eval_dataset_path: “/workspace/tlt-experiments/data/split/test”
model_path: “/workspace/tlt-experiments/classification/output/weights/resnet_080.tlt”
top_k: 3
batch_size: 256
n_workers: 8
}
**
Can you please suggest where we are doing mistake.? I Had trained model on both 2080TI and V100 as well.
Please find some of the training logs.
Epoch 744/1000
69/69 [==============================] - 7s 106ms/step - loss: 0.1665 - acc: 1.0000 - val_loss: 0.1739 - val_acc: 0.9984
Epoch 745/1000
69/69 [==============================] - 7s 106ms/step - loss: 0.1665 - acc: 1.0000 - val_loss: 0.1729 - val_acc: 0.9968
Epoch 746/1000
69/69 [==============================] - 7s 106ms/step - loss: 0.1665 - acc: 1.0000 - val_loss: 0.1735 - val_acc: 0.9968
Epoch 747/1000
69/69 [==============================] - 7s 106ms/step - loss: 0.1665 - acc: 1.0000 - val_loss: 0.1725 - val_acc: 0.9968
Epoch 748/1000
69/69 [==============================] - 7s 106ms/step - loss: 0.1665 - acc: 1.0000 - val_loss: 0.1725 - val_acc: 0.9968
Thanks.