Tao Classifier Mobilenetv2 very low accuracy compared to effecientnet b0 & Resnet

Hi,

Upon training a 2 class classifier using: tao classification train -e ... --gpus 2 command we are seeing terrible accuracy performance from mobilenetv2.

I was hoping someone can have a look at the below config and point out any potential errors? This is quite a simple person shirt type classification task. Hivis or not Hivis.

Accuracy Comparison trained in TAO:

  • Resnet50: 99%
  • EffecientnetB0 (pruned and retrained): 96%
  • Mobilenetv2: ~65%
  • Mobilenetv3 (not TAO trained): ~95%

MobilenetV2 training config file:

model_config {
  # Model Architecture can be chosen from:
  # ['resnet', 'vgg', 'googlenet', 'alexnet']
  arch: "mobilenet_v2"
  # for resnet --> n_layers can be [10, 18, 50]
  # for vgg --> n_layers can be [16, 19]
  #n_layers: 50
  use_batch_norm: False
  use_bias: False
  all_projections: False
  use_pooling: False
  use_imagenet_head: False
  resize_interpolation_method: BICUBIC
  # if you want to use the pretrained model,
  # image size should be "3,224,224"
  # otherwise, it can be "3, X, Y", where X,Y >= 16
  input_image_size: "3,224,224"
}
train_config {
  train_dataset_path: "/iclassifier/data/train/"
  val_dataset_path: "/classifier/data/test/"
  #pretrained_model_path: "/path/to/your/pretrained/model"
  # Only ['sgd', 'adam'] are supported for optimizer
  optimizer {
      sgd {
      lr: 0.1
      decay: 0.0
      momentum: 0.01
      nesterov: True
      }
  }
  batch_size_per_gpu: 512
  n_epochs: 120
  # Number of CPU cores for loading data
  n_workers: 40
  # regularizer
  reg_config {
      # regularizer type can be "L1", "L2" or "None".
      type: "L2"
      # if the type is not "None",
      # scope can be either "Conv2D" or "Dense" or both.
      scope: "Conv2D,Dense"
      # 0 < weight decay < 1
      weight_decay: 0.000015
  }
  # learning_rate
  lr_config {
      cosine {
      learning_rate: 0.1
      soft_start: 0.0
      min_lr_ratio: 0.001
      }
  }
  enable_random_crop: False
  enable_center_crop: False
  enable_color_augmentation: False
  mixup_alpha: 0.2
  label_smoothing: 0.1
  preprocess_mode: "torch"
}
eval_config {
  eval_dataset_path: "/classifier/data/test/"
  model_path: "/classifier/model/"
  top_k: 3
  batch_size: 512
  n_workers: 20
  enable_center_crop: False
}

EffecientnetB0 training config file:

model_config {
  arch: "efficientnet_b0"
  use_bias: False
  use_imagenet_head: False
  resize_interpolation_method: BICUBIC
  input_image_size: "3,224,224"
}
train_config {
  preprocess_mode: "caffe"
  train_dataset_path: "/classifier/data/train/"
  val_dataset_path: "/classifier/data/test/"
  #pretrained_model_path: "/inviol/hivis_classifier/model/weights/efficientnet_b0_042_pruned.tlt"
  optimizer {
    sgd {
    lr: 0.01
    decay: 0.0
    momentum: 0.9
    nesterov: False
  }
  }
  batch_size_per_gpu: 256
  n_epochs: 120
  n_workers: 60
  reg_config {
    type: "None"
    scope: "Conv2D,Dense"
    weight_decay: 5e-5
  }
  lr_config {
    cosine {
      learning_rate: 0.05
      min_lr_ratio: 0.001
    }
  }
  enable_random_crop: True
  enable_center_crop: True
  enable_color_augmentation: False
  mixup_alpha: 0.2
  label_smoothing: 0.1
}
eval_config {
  eval_dataset_path: "/classifier/data/test/"
  #model_path: "/classifier/model/retrain/"
  top_k: 3
  batch_size: 256
  n_workers: 60
  enable_center_crop: False
}

Other info:

• Training Hardware: 2 x RTX A6000 GPU
• Network Type: MobilenetV2
• TLT Version: tao/tao-toolkit-tf: v3.21.11-tf1.15.5-py3

According to Open Images Pre-trained Image Classification — TAO Toolkit 3.22.02 documentationTAO Pretrained Classification | NVIDIA NGC ,
the pretrained model of Mobilenetv2 has an accuracy of 72.75 ,
the pretrained model of resnet50 has an accuracy of 77.91 .

Did you use the mobilenetv2 pretrained model in the training?

Hi @Morganh ,

Both models are trained from scratch with our own data.

We did not use any pre-trained models for training as per the configs above.

I think there is an issue with the mobilenetv2 config due to the vast discrepancy between the accuracies with the same dataset. I was hoping someone could review it?

Thanks!

Hi @Morganh ,

Both models are trained from scratch with our own data.

We did not use any pre-trained models for training as per the configs above.

I think there is an issue with the mobilenetv2 config due to the vast discrepancy between the accuracies with the same dataset. I was hoping someone could review it?

Thanks

Can you try Mobilenetv2 with tao/tao-toolkit-tf: v3.21.08-py3 docker?
Just want to narrow down.

Please set
use_batch_norm: True

and
enable_random_crop: True
enable_center_crop: True

Thanks @Morganh but changing those parameters hasn’t helped. Still getting 65.5% accuracy.

I will try v3.21.08 and revert back.

Hi @Morganh how do you install previous versions of TAO toolkit?

You can pull the docker image from

or older version Transfer Learning Toolkit for Video Streaming Analytics | NVIDIA NGC

Then run with
$ docker run --runtime=nvidia -it --rm -v localfolder:dockerfolder <docker> /bin/bash

1 Like

@Morganh must be a bug in the latest TAO toolkit (v3.21.11-tf1.15.5-py3). Am getting 93% accuracy on the 2nd Epoch for mobilenetv2 when using v3.21.08 directly from docker. It will not get past 65% on the latest version.

Unfortunately there is an issue. We will check further and follow up. Thanks for your catching.
Appreciate your work!

1 Like

Hi ,
If possible, can you v3.21.11-tf1.15.4-py3 docker?
$ docker pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3

@Morganh I can confirm the issue also exists in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3

After 2 Epochs the accuracy is 41% with v3.21.11-tf1.15.4-py3 compared to 90% when using v3.21.08.

It’s actually considerably worse then the latest tao toolkit.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Hi,
Sorry for late reply. After checking and experiments on 3.21.08 and v3.21.11-tf1.15.5-py3 or v3.21.11-tf1.15.4-py3 , I cannot find the mAP drop for mobilenetv2 backbone.
I ran the experiments according to the spec file inside Jupyter notebook.
The training ran on 20 classes of VOC dataset.
Please try to run on it if you have time. Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.