Poor result on resnet10 and resnet18 on TLT-V2 while using the model in Deepstream

Hi Team,

I have trained my model on age classification(on person b-box not face) using resnet10 on tlt-v2.
Classes : age_0-15, age_16-35, age_36-55, age_55+
My result on my test data is

Train Resnet10 Epoch 102
may take a while…
Confusion Matrix
[[248 5 1 0]
[ 6 207 37 4]
[ 1 15 235 3]
[ 0 0 0 254]]
Classification Report
precision recall f1-score support

age_0-15       0.97      0.98      0.97       254

age_16-35 0.91 0.81 0.86 254
age_36-55 0.86 0.93 0.89 254
age_55+ 0.97 1.00 0.99 254

micro avg 0.93 0.93 0.93 1016
macro avg 0.93 0.93 0.93 1016
weighted avg 0.93 0.93 0.93 1016

Retrain Resnet10 Epoch 143 :
Confusion Matrix
[[250 3 1 0]
[ 6 204 41 3]
[ 5 20 227 2]
[ 0 0 0 254]]
Classification Report
precision recall f1-score support

age_0-15       0.96      0.98      0.97       254

age_16-35 0.90 0.80 0.85 254
age_36-55 0.84 0.89 0.87 254
age_55+ 0.98 1.00 0.99 254

micro avg 0.92 0.92 0.92 1016
macro avg 0.92 0.92 0.92 1016
weighted avg 0.92 0.92 0.92 1016

But when I use these model and test it live in deep-stream I am getting maximum time age_55+ class even the person belong to age_0-15 or age_16-35. Can you please suggest where I am wrong.

My training_config files :

model_config {
** arch: “resnet”,**
** n_layers: 10**
** # Setting these parameters to true to match the template downloaded from NGC.**
** use_batch_norm: true**
** all_projections: true**
** freeze_blocks: 0**
** freeze_blocks: 1**
** input_image_size: “3,400,200”**
}
train_config {
** train_dataset_path: “/workspace/tlt-experiments/data/split/train”**
** val_dataset_path: “/workspace/tlt-experiments/data/split/val”**
** pretrained_model_path: “/workspace/tlt-experiments/classification/pretrained_resnet10/tlt_pretrained_classification_vresnet10/resnet_10.hdf5”**
** optimizer: “sgd”**
** batch_size_per_gpu: 64**
** n_epochs: 300**
** n_workers: 16**

** # regularizer**
** reg_config {**
** type: “L2”**
** scope: “Conv2D,Dense”**
** weight_decay: 0.00005**
** }**

** # learning_rate**
** lr_config {**
** scheduler: “step”**
** learning_rate: 0.006**
** #soft_start: 0.056**
** #annealing_points: “0.3, 0.6, 0.8”**
** #annealing_divider: 10**
** step_size: 10**
** gamma: 0.1**
** }**
}
eval_config {
** eval_dataset_path: “/workspace/tlt-experiments/data/split/test”**
** model_path: “/workspace/tlt-experiments/classification/output/weights/resnet_102.tlt”**
** top_k: 3**
** batch_size: 256**
** n_workers: 8**
}

I also went with resnet18 but getting same issue.

Please help.
Thanks.

Please run tlt-infer to check its result firstly.

Hi Morganh,

I have also tested the result using tlt-infere on complete test data.
Each class contains 254 image.
age_0-15 - > 19 wrong.
age_16-35 - > 26 wrong
age_36_55 -> 106 wrong
age_55+ - > 17 wrong.

but running with Deep stream giving biased result towards class age_55+.

Please suggest what should we do.

Thanks.

So, do you mean the result of tlt-infer is expected but deepstream’s result is not expected?

Yes but there also need improvement
age_0-15 - > 19 wrong.
age_16-35 - > 26 wrong
age_36_55 -> 106 wrong
age_55+ - > 17 wrong.
have to reach 0-1 wrong in each class.,

I am facing model biased toward single class more with resnet10 and resnet18. with resnet50 I was not getting biased result.
with all the three model training data was same and configuration parameter were also the same except number of layers.
Can you please let me know where are the gaps. ?

You mentioned that your tlt-infer result with resnet10 and resnet18 is not good, what is the training accuracy result during the log? If not meet your requirement, you need to trigger more experiments to get a higher training result. Try to finetune hyper-parameters, add dataset, etc. It is a topic of training.

You also mention that your resnet50 tlt-infer result is good. What is its training accuracy result during the log? Can deepstream run well with this resnet50 model? If there is a gap between deepstream and tlt-infer, you need to check deepstream config firstly.

Training logs :
Resnet10:

118,0.99943741209564,0.12332773998735994,0.9212598425196851,0.39120948261867355
119,0.99943741209564,0.12247711290119402,0.9114173228346457,0.39215778225050196
120,0.99943741209564,0.12329318260593253,0.9133858267716536,0.39798230190915385
121,0.99915611814346,0.12298165021985559,0.9153543307086615,0.3944643869644075
122,0.9988748241912799,0.12335022698810165,0.9173228346456693,0.3978482203807418
123,0.9977496483825598,0.12459035501324174,0.9153543307086615,0.39405789537223307
124,0.99971870604782,0.12321298180250176,0.9212598425196851,0.3877223948324759

Resnet18:
115,0.9930656931696147,0.16175098495326773,0.875,0.6687979786371698
116,0.9930656931696147,0.16010095211711242,0.8979591836734694,0.6186932793685368
117,0.9948905109489051,0.15361222642181563,0.8647959183673469,0.7049501638631431
118,0.9945255474452555,0.15107576564280656,0.8903061224489796,0.6329286779676165
119,0.9945255474452555,0.15169603078469743,0.9005102040816326,0.5941275841727549
120,0.9934306566732644,0.15416788234762901,0.8877551020408163,0.6722254655799087
121,0.994525546923171,0.1563258668584545,0.8852040816326531,0.7187542656854707
122,0.9908759124087592,0.15480450430925746,0.8877551020408163,0.6677013991438613

Okay will fine tune the hyper params and will check DS configuration as well.