Training set accuracy is lower than validation set accuracy for classification task

hosmir13761998 · April 10, 2022, 9:02am

Hi everyone,
I trained a model for classification tasks with TAO.
I used ResNet18 for classification and divided my dataset into the training set, validation set, and testing set(70%, 10%, 20%).
I trained the model several times and each time validation set accuracy and testing set accuracy were higher than training set accuracy. I shuffled the dataset several times but each time the result was the same and the training set accuracy was lower than the validation and the testing set accuracy. Also, I changed the training set, validation set, and testing set ratio into (40% ,40% , 20%) but the result was the same.
The best result was :
training set accuracy : 0.94
validation set accuracy : 0.98
testing set accuracy : 0.97

classification model => ResNet18
TAO version => v3.21.08-py3
RTX 2080

Morganh · April 10, 2022, 5:21pm

Please use 3.21.11 docker and try below setting.
activation {
activation_type: “mish”
}

In one experiment with above setting, I can see the training set accuracy is higher than validation accuracy.

hosmir13761998 · April 13, 2022, 7:43am

Unfortunately did not change the results. I also examine ‘relu’ and ‘swish’, but did not change. I share my training config and I ask you to see it, please.
Thanks a lot.

model_config {
arch: “resnet”,
n_layers: 18
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,224,224”
activation {activation_type: “mish”}
}

train_config {
train_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/train”
val_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/val”
pretrained_model_path: “/media/user/data1/workspace/mirzaei/new_tao_518/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5”
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 30
n_workers: 16
preprocess_mode: “caffe”
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1

regularizer

reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}

learning_rate

lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/test”
model_path: “/workspace/tao-experiments/classification/output/weights/resnet_080.tlt”
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}

In the following, the results of each epoch exist :

Epoch 20/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6853 - acc: 0.9037 - val_loss: 0.3653 - val_acc: 0.9613
Epoch 21/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6938 - acc: 0.8991 - val_loss: 0.3636 - val_acc: 0.9624
Epoch 22/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6982 - acc: 0.8975 - val_loss: 0.3647 - val_acc: 0.9634
Epoch 23/30 201/201 [==============================] - 35s 174ms/step - loss: 0.6925 - acc: 0.9022 - val_loss: 0.3633 - val_acc: 0.9634
Epoch 24/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6901 - acc: 0.8987 - val_loss: 0.3635 - val_acc: 0.9651
Epoch 25/30 201/201 [==============================] - 34s 171ms/step - loss: 0.7047 - acc: 0.8998 - val_loss: 0.3647 - val_acc: 0.9629
Epoch 26/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6970 - acc: 0.9007 - val_loss: 0.3662 - val_acc: 0.9640 Epoch
27/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6844 - acc: 0.9024 - val_loss: 0.3664 - val_acc: 0.9645
Epoch 28/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6878 - acc: 0.9026 - val_loss: 0.3647 - val_acc: 0.9651
Epoch 29/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6908 - acc: 0.9017 - val_loss: 0.3656 - val_acc: 0.9634
Epoch 30/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6897 - acc: 0.9042 - val_loss: 0.3646 - val_acc: 0.9656
2022-04-13 07:41:19,449 [INFO] main: Total Val Loss: 0.36458083987236023 2022-04-13 07:41:19,449 [INFO] main: Total Val accuracy: 0.9656301140785217 2022-04-13 07:41:19,450 [INFO] main: Training finished successfully. 2022-04-13 12:11:20,808 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Thank you very much.

Morganh · April 13, 2022, 8:52am

Actually it is normal to get this result since they are different datasets.
Training dataset is different from test or val dataset.
We cannot draw a conclusion that for tao classification network the training accuracy will be always lower than val set.
Sometimes val accuracy is higher and sometimes training accuracy is higher.
You can try more experiments on other datasets. My result above is running with Imagenet dataset.

hosmir13761998 · April 13, 2022, 9:33am

Your comment is true but I trained ResNet with all parameters in the TAO train config file in the Tensorflow framework. The training set accuracy was higher than the validation set accuracy with the Tensorflow framework.
May you say it was true or false train config file? all parameter was true?

hosmir13761998 · April 13, 2022, 9:36am

I also shuffled several times my dataset and changed the distribution of data.

Morganh · April 14, 2022, 7:37am

Please try more experiments.

other backbones
or without pretrained model

For example, below config is an example to train Imagenet dataset.

model_config {
  # Model Architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet']
  arch: "cspdarknet"

  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]
  n_layers: 53
  use_batch_norm: True
  use_bias: False
  use_imagenet_head: True
  all_projections: False
  use_pooling: True
  # if you want to use the pretrained model,
  # image size should be "3,224,224"
  # otherwise, it can be "3, X, Y", where X,Y >= 16
  input_image_size: "3,224,224"
  activation {
     activation_type: "mish"
     }
}
train_config {
  train_dataset_path: "/raid/ImageNet2012/ImageNet2012/train"
  val_dataset_path: "/raid/ImageNet2012/ImageNet2012/val"
  # Only ['sgd', 'adam'] are supported for optimizer
  optimizer {
    sgd {
        lr: 0.01
        decay: 0.0
        momentum: 0.9
        nesterov: False
    }
  }
  preprocess_mode: "torch"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  batch_size_per_gpu: 64
  n_epochs: 300
  mixup_alpha: 0.2

  # Number of CPU cores for loading data
  n_workers: 40

  # regularizer
  reg_config {
    # regularizer type can be "L1", "L2" or "None".
    type: "L2"
    # if the type is not "None",
    # scope can be either "Conv2D" or "Dense" or both.
    scope: "Conv2D,Dense"
    # 0 < weight decay < 1
    weight_decay: 0.00003
  }

  # learning_rate
  lr_config {
    cosine{
        learning_rate: 0.05
        soft_start: 0.0
        min_lr_ratio: 0.001
    }
  }
}

system · April 28, 2022, 7:37am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO classification results problem TAO Toolkit	2	448	April 4, 2023
Tao Classifier Mobilenetv2 very low accuracy compared to effecientnet b0 & Resnet TAO Toolkit	16	2305	January 18, 2022
TAO Classification provides low precision with VehicleTypeNet pretrained model TAO Toolkit	2	395	October 13, 2022
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
Classification accuracy dropped a lot with triton server TAO Toolkit	4	645	June 27, 2023
Training Classification model from scratch TAO Toolkit	6	742	October 12, 2021
Loss, acc, val_acc get stablized soon in both train and re-train TAO Toolkit	6	438	July 3, 2023
Unable to detect object after training TAO Toolkit	25	993	October 12, 2021
TAO deploy docker eval accuracy lower vs TAO tf1 docker TAO Toolkit	13	27	August 27, 2024
Expert Advice Regarding Poor result after training classification model using TLT on resnet18 Deep Learning (Training & Inference)	1	282	November 9, 2020

Training set accuracy is lower than validation set accuracy for classification task

regularizer

learning_rate

Related topics