Tao Classifier Mobilenetv2 very low accuracy compared to effecientnet b0 & Resnet

tane.vanderboon · November 29, 2021, 1:45am

Hi,

Upon training a 2 class classifier using: tao classification train -e ... --gpus 2 command we are seeing terrible accuracy performance from mobilenetv2.

I was hoping someone can have a look at the below config and point out any potential errors? This is quite a simple person shirt type classification task. Hivis or not Hivis.

Accuracy Comparison trained in TAO:

Resnet50: 99%
EffecientnetB0 (pruned and retrained): 96%
Mobilenetv2: ~65%
Mobilenetv3 (not TAO trained): ~95%

MobilenetV2 training config file:

model_config {
  # Model Architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet']
  arch: "mobilenet_v2"
  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]
  #n_layers: 50
  use_batch_norm: False
  use_bias: False
  all_projections: False
  use_pooling: False
  use_imagenet_head: False
  resize_interpolation_method: BICUBIC
  # if you want to use the pretrained model,
  # image size should be "3,224,224"
  # otherwise, it can be "3, X, Y", where X,Y >= 16
  input_image_size: "3,224,224"
}
train_config {
  train_dataset_path: "/iclassifier/data/train/"
  val_dataset_path: "/classifier/data/test/"
  #pretrained_model_path: "/path/to/your/pretrained/model"
  # Only ['sgd', 'adam'] are supported for optimizer
  optimizer {
      sgd {
      lr: 0.1
      decay: 0.0
      momentum: 0.01
      nesterov: True
      }
  }
  batch_size_per_gpu: 512
  n_epochs: 120
  # Number of CPU cores for loading data
  n_workers: 40
  # regularizer
  reg_config {
      # regularizer type can be "L1", "L2" or "None".
      type: "L2"
      # if the type is not "None",
      # scope can be either "Conv2D" or "Dense" or both.
      scope: "Conv2D,Dense"
      # 0 < weight decay < 1
      weight_decay: 0.000015
  }
  # learning_rate
  lr_config {
      cosine {
      learning_rate: 0.1
      soft_start: 0.0
      min_lr_ratio: 0.001
      }
  }
  enable_random_crop: False
  enable_center_crop: False
  enable_color_augmentation: False
  mixup_alpha: 0.2
  label_smoothing: 0.1
  preprocess_mode: "torch"
}
eval_config {
  eval_dataset_path: "/classifier/data/test/"
  model_path: "/classifier/model/"
  top_k: 3
  batch_size: 512
  n_workers: 20
  enable_center_crop: False
}

EffecientnetB0 training config file:

model_config {
  arch: "efficientnet_b0"
  use_bias: False
  use_imagenet_head: False
  resize_interpolation_method: BICUBIC
  input_image_size: "3,224,224"
}
train_config {
  preprocess_mode: "caffe"
  train_dataset_path: "/classifier/data/train/"
  val_dataset_path: "/classifier/data/test/"
  #pretrained_model_path: "/inviol/hivis_classifier/model/weights/efficientnet_b0_042_pruned.tlt"
  optimizer {
    sgd {
    lr: 0.01
    decay: 0.0
    momentum: 0.9
    nesterov: False
  }
  }
  batch_size_per_gpu: 256
  n_epochs: 120
  n_workers: 60
  reg_config {
    type: "None"
    scope: "Conv2D,Dense"
    weight_decay: 5e-5
  }
  lr_config {
    cosine {
      learning_rate: 0.05
      min_lr_ratio: 0.001
    }
  }
  enable_random_crop: True
  enable_center_crop: True
  enable_color_augmentation: False
  mixup_alpha: 0.2
  label_smoothing: 0.1
}
eval_config {
  eval_dataset_path: "/classifier/data/test/"
  #model_path: "/classifier/model/retrain/"
  top_k: 3
  batch_size: 256
  n_workers: 60
  enable_center_crop: False
}

Other info:

• Training Hardware: 2 x RTX A6000 GPU
• Network Type: MobilenetV2
• TLT Version: tao/tao-toolkit-tf: v3.21.11-tf1.15.5-py3

Morganh · November 29, 2021, 7:41am

According to Open Images Pre-trained Image Classification — TAO Toolkit 3.22.02 documentation → TAO Pretrained Classification | NVIDIA NGC ,
the pretrained model of Mobilenetv2 has an accuracy of 72.75 ,
the pretrained model of resnet50 has an accuracy of 77.91 .

Did you use the mobilenetv2 pretrained model in the training?

pullmyleg · November 29, 2021, 8:21am

Hi @Morganh ,

Both models are trained from scratch with our own data.

We did not use any pre-trained models for training as per the configs above.

I think there is an issue with the mobilenetv2 config due to the vast discrepancy between the accuracies with the same dataset. I was hoping someone could review it?

Thanks!

tane.vanderboon · November 30, 2021, 4:08am

Hi @Morganh ,

Both models are trained from scratch with our own data.

We did not use any pre-trained models for training as per the configs above.

I think there is an issue with the mobilenetv2 config due to the vast discrepancy between the accuracies with the same dataset. I was hoping someone could review it?

Thanks

Morganh · November 30, 2021, 6:29am

Can you try Mobilenetv2 with tao/tao-toolkit-tf: v3.21.08-py3 docker?
Just want to narrow down.

Morganh · November 30, 2021, 6:51am

Please set
use_batch_norm: True

and
enable_random_crop: True
enable_center_crop: True

tane.vanderboon · December 1, 2021, 7:13pm

Thanks @Morganh but changing those parameters hasn’t helped. Still getting 65.5% accuracy.

I will try v3.21.08 and revert back.

tane.vanderboon · December 2, 2021, 11:04pm

Hi @Morganh how do you install previous versions of TAO toolkit?

Morganh · December 3, 2021, 12:45am

You can pull the docker image from

or older version Transfer Learning Toolkit for Video Streaming Analytics | NVIDIA NGC

Then run with
$ docker run --runtime=nvidia -it --rm -v localfolder:dockerfolder <docker> /bin/bash

tane.vanderboon · December 6, 2021, 3:03am

@Morganh must be a bug in the latest TAO toolkit (v3.21.11-tf1.15.5-py3). Am getting 93% accuracy on the 2nd Epoch for mobilenetv2 when using v3.21.08 directly from docker. It will not get past 65% on the latest version.

Morganh · December 6, 2021, 3:13am

Unfortunately there is an issue. We will check further and follow up. Thanks for your catching.
Appreciate your work!

Morganh · December 6, 2021, 6:50am

Hi ,
If possible, can you v3.21.11-tf1.15.4-py3 docker?
$ docker pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3

tane.vanderboon · December 6, 2021, 7:15pm

@Morganh I can confirm the issue also exists in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3

After 2 Epochs the accuracy is 41% with v3.21.11-tf1.15.4-py3 compared to 90% when using v3.21.08.

It’s actually considerably worse then the latest tao toolkit.

system · December 20, 2021, 7:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Morganh · January 4, 2022, 2:41pm

Hi,
Sorry for late reply. After checking and experiments on 3.21.08 and v3.21.11-tf1.15.5-py3 or v3.21.11-tf1.15.4-py3 , I cannot find the mAP drop for mobilenetv2 backbone.
I ran the experiments according to the spec file inside Jupyter notebook.
The training ran on 20 classes of VOC dataset.
Please try to run on it if you have time. Thanks.

system · January 18, 2022, 2:41pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Training set accuracy is lower than validation set accuracy for classification task TAO Toolkit	7	1551	April 14, 2022
TAO Classification provides low precision with VehicleTypeNet pretrained model TAO Toolkit	2	395	October 13, 2022
Error detectnet_V2 train with TAO : dbscan_min_samples: 0.05' TAO Toolkit tao	4	388	November 7, 2023
TAO action recogniton net trainning extremely slow TAO Toolkit tao	20	644	August 7, 2023
Issues with tao classifier_tf2 in deepstream (Accuracy drops) TAO Toolkit deepstream	21	52	September 6, 2024
TAO classification results problem TAO Toolkit	2	449	April 4, 2023
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Reproducibility PoseClassificationNet (0 accuracy for sitting class) TAO Toolkit tao	6	436	April 6, 2023
TAO 5.0 Training Spec discrepancy TAO Toolkit	9	466	August 3, 2023
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1239	April 12, 2023

Tao Classifier Mobilenetv2 very low accuracy compared to effecientnet b0 & Resnet

Related topics