Training set accuracy is lower than validation set accuracy for classification task

Hi everyone,
I trained a model for classification tasks with TAO.
I used ResNet18 for classification and divided my dataset into the training set, validation set, and testing set(70%, 10%, 20%).
I trained the model several times and each time validation set accuracy and testing set accuracy were higher than training set accuracy. I shuffled the dataset several times but each time the result was the same and the training set accuracy was lower than the validation and the testing set accuracy. Also, I changed the training set, validation set, and testing set ratio into (40% ,40% , 20%) but the result was the same.
The best result was :
training set accuracy : 0.94
validation set accuracy : 0.98
testing set accuracy : 0.97

classification model => ResNet18
TAO version => v3.21.08-py3
RTX 2080

Please use 3.21.11 docker and try below setting.
activation {
activation_type: “mish”
}

In one experiment with above setting, I can see the training set accuracy is higher than validation accuracy.

Unfortunately did not change the results. I also examine ‘relu’ and ‘swish’, but did not change. I share my training config and I ask you to see it, please.
Thanks a lot.

model_config {
arch: “resnet”,
n_layers: 18
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,224,224”
activation {activation_type: “mish”}
}

train_config {
train_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/train”
val_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/val”
pretrained_model_path: “/media/user/data1/workspace/mirzaei/new_tao_518/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5”
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 30
n_workers: 16
preprocess_mode: “caffe”
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1

regularizer

reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}

learning_rate

lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: “/media/user/data1/workspace/mirzaei/new_tao_518/data/split/test”
model_path: “/workspace/tao-experiments/classification/output/weights/resnet_080.tlt”
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}

In the following, the results of each epoch exist :

Epoch 20/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6853 - acc: 0.9037 - val_loss: 0.3653 - val_acc: 0.9613
Epoch 21/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6938 - acc: 0.8991 - val_loss: 0.3636 - val_acc: 0.9624
Epoch 22/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6982 - acc: 0.8975 - val_loss: 0.3647 - val_acc: 0.9634
Epoch 23/30 201/201 [==============================] - 35s 174ms/step - loss: 0.6925 - acc: 0.9022 - val_loss: 0.3633 - val_acc: 0.9634
Epoch 24/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6901 - acc: 0.8987 - val_loss: 0.3635 - val_acc: 0.9651
Epoch 25/30 201/201 [==============================] - 34s 171ms/step - loss: 0.7047 - acc: 0.8998 - val_loss: 0.3647 - val_acc: 0.9629
Epoch 26/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6970 - acc: 0.9007 - val_loss: 0.3662 - val_acc: 0.9640 Epoch
27/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6844 - acc: 0.9024 - val_loss: 0.3664 - val_acc: 0.9645
Epoch 28/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6878 - acc: 0.9026 - val_loss: 0.3647 - val_acc: 0.9651
Epoch 29/30 201/201 [==============================] - 34s 171ms/step - loss: 0.6908 - acc: 0.9017 - val_loss: 0.3656 - val_acc: 0.9634
Epoch 30/30 201/201 [==============================] - 34s 170ms/step - loss: 0.6897 - acc: 0.9042 - val_loss: 0.3646 - val_acc: 0.9656
2022-04-13 07:41:19,449 [INFO] main: Total Val Loss: 0.36458083987236023 2022-04-13 07:41:19,449 [INFO] main: Total Val accuracy: 0.9656301140785217 2022-04-13 07:41:19,450 [INFO] main: Training finished successfully. 2022-04-13 12:11:20,808 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Thank you very much.

Actually it is normal to get this result since they are different datasets.
Training dataset is different from test or val dataset.
We cannot draw a conclusion that for tao classification network the training accuracy will be always lower than val set.
Sometimes val accuracy is higher and sometimes training accuracy is higher.
You can try more experiments on other datasets. My result above is running with Imagenet dataset.

Your comment is true but I trained ResNet with all parameters in the TAO train config file in the Tensorflow framework. The training set accuracy was higher than the validation set accuracy with the Tensorflow framework.
May you say it was true or false train config file? all parameter was true?

I also shuffled several times my dataset and changed the distribution of data.

Please try more experiments.

  • other backbones
  • or without pretrained model

For example, below config is an example to train Imagenet dataset.

model_config {
  # Model Architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet']
  arch: "cspdarknet"

  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]
  n_layers: 53
  use_batch_norm: True
  use_bias: False
  use_imagenet_head: True
  all_projections: False
  use_pooling: True
  # if you want to use the pretrained model,
  # image size should be "3,224,224"
  # otherwise, it can be "3, X, Y", where X,Y >= 16
  input_image_size: "3,224,224"
  activation {
     activation_type: "mish"
     }
}
train_config {
  train_dataset_path: "/raid/ImageNet2012/ImageNet2012/train"
  val_dataset_path: "/raid/ImageNet2012/ImageNet2012/val"
  # Only ['sgd', 'adam'] are supported for optimizer
  optimizer {
    sgd {
        lr: 0.01
        decay: 0.0
        momentum: 0.9
        nesterov: False
    }
  }
  preprocess_mode: "torch"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  batch_size_per_gpu: 64
  n_epochs: 300
  mixup_alpha: 0.2

  # Number of CPU cores for loading data
  n_workers: 40

  # regularizer
  reg_config {
    # regularizer type can be "L1", "L2" or "None".
    type: "L2"
    # if the type is not "None",
    # scope can be either "Conv2D" or "Dense" or both.
    scope: "Conv2D,Dense"
    # 0 < weight decay < 1
    weight_decay: 0.00003
  }

  # learning_rate
  lr_config {
    cosine{
        learning_rate: 0.05
        soft_start: 0.0
        min_lr_ratio: 0.001
    }
  }
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.