Poor result on resnet10 and resnet18 on TLT-V2 while using the model in Deepstream

Hi Team,

I have trained my model on age classification(on person b-box not face) using resnet10 on tlt-v2.
Classes : age_0-15, age_16-35, age_36-55, age_55+
My result on my test data is

Train Resnet10 Epoch 102
may take a while…
Confusion Matrix
[[248 5 1 0]
[ 6 207 37 4]
[ 1 15 235 3]
[ 0 0 0 254]]
Classification Report
precision recall f1-score support

age_0-15       0.97      0.98      0.97       254

age_16-35 0.91 0.81 0.86 254
age_36-55 0.86 0.93 0.89 254
age_55+ 0.97 1.00 0.99 254

micro avg 0.93 0.93 0.93 1016
macro avg 0.93 0.93 0.93 1016
weighted avg 0.93 0.93 0.93 1016

Retrain Resnet10 Epoch 143 :
Confusion Matrix
[[250 3 1 0]
[ 6 204 41 3]
[ 5 20 227 2]
[ 0 0 0 254]]
Classification Report
precision recall f1-score support

age_0-15       0.96      0.98      0.97       254

age_16-35 0.90 0.80 0.85 254
age_36-55 0.84 0.89 0.87 254
age_55+ 0.98 1.00 0.99 254

micro avg 0.92 0.92 0.92 1016
macro avg 0.92 0.92 0.92 1016
weighted avg 0.92 0.92 0.92 1016

But when I use these model and test it live in deep-stream I am getting maximum time age_55+ class even the person belong to age_0-15 or age_16-35. Can you please suggest where I am wrong.

My training_config files :

model_config {
** arch: “resnet”,**
** n_layers: 10**
** # Setting these parameters to true to match the template downloaded from NGC.**
** use_batch_norm: true**
** all_projections: true**
** freeze_blocks: 0**
** freeze_blocks: 1**
** input_image_size: “3,400,200”**
}
train_config {
** train_dataset_path: “/workspace/tlt-experiments/data/split/train”**
** val_dataset_path: “/workspace/tlt-experiments/data/split/val”**
** pretrained_model_path: “/workspace/tlt-experiments/classification/pretrained_resnet10/tlt_pretrained_classification_vresnet10/resnet_10.hdf5”**
** optimizer: “sgd”**
** batch_size_per_gpu: 64**
** n_epochs: 300**
** n_workers: 16**

** # regularizer**
** reg_config {**
** type: “L2”**
** scope: “Conv2D,Dense”**
** weight_decay: 0.00005**
** }**

** # learning_rate**
** lr_config {**
** scheduler: “step”**
** learning_rate: 0.006**
** #soft_start: 0.056**
** #annealing_points: “0.3, 0.6, 0.8”**
** #annealing_divider: 10**
** step_size: 10**
** gamma: 0.1**
** }**
}
eval_config {
** eval_dataset_path: “/workspace/tlt-experiments/data/split/test”**
** model_path: “/workspace/tlt-experiments/classification/output/weights/resnet_102.tlt”**
** top_k: 3**
** batch_size: 256**
** n_workers: 8**
}

I also went with resnet18 but getting same issue.

Please help.
Thanks.

Request you to raise TLT related issue in below forum
https://forums.developer.nvidia.com/c/accelerated-computing/intelligent-video-analytics/transfer-learning-toolkit/17

Thanks

Okay Thanks.