TAO 5.0 Training Spec discrepancy

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification_tf2 and AutoML
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v5.0.0
• Training spec file(If have, please share here) tao-getting-started_v5.0.0/notebooks/tao_launcher_starter_kit/classification_tf2/tao_voc/specs/spec.yaml AND https://github.com/NVIDIA/tao_front_end_services/blob/main/api/specs_utils/specs/classification_tf2/classification_tf2%20-%20train.csv
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

For the same classification_tf2 model of EfficientNet_B0, there seem to be 2 different ways in which one can mention training specs.

  1. Mentioned in tao-getting-started_v5.0.0/notebooks/tao_launcher_starter_kit/classification_tf2/tao_voc/specs/spec.yaml

There are choices for LR schedule - cosine, step etc. And they are present under train.lr_config as seen below

image

  1. Mentioned on the latest github repo https://github.com/NVIDIA/tao_front_end_services/blob/main/api/specs_utils/specs/classification_tf2/classification_tf2%20-%20train.csv

image

Don’t see any choices for LR schedule and the train.lr_config doesn’t seem to exist anymore. I also see added options under train.optim_config but no way to tell which belong to which type of optimizer, eg SGD, Adam, Adadelta etc.

Here are my questions.

  1. I was able to train a model with both specs, but which spec file should be followed for consistency purposes? I am guessing the new one because of its recency.

  2. For the new one, there is no documentation regarding what choices of hyperparams are available, is there any place where I can look it up?

For TAO 5.0, the spec without TAO-API can be found in https://github.com/NVIDIA/tao_tensorflow2_backend/tree/main/nvidia_tao_tf2/cv/classification/experiment_specs .
Latest 5.0 spec file should be available soon.
Currently, the source code is available. https://github.com/NVIDIA/tao_tensorflow2_backend/blob/main/nvidia_tao_tf2/cv/classification/config/default_config.py#L60

Thanks for this
So the LR config cannot be set by using AutoML at all? Because I don’t see it in here

Also, there is a train.optim_config.lr and a train.lr_config. learning_rate. I am guessing the latter overrrides the former. And yet, the former is by default included in the automl hyperparameters, and there is no mention of whether the latter can be used or not

Also
By default optimizer_config.beta_1 and optimizer_config.nesterov are enabled for AutoML, but there really is no optimizer in which both of these parameters are used simultaneously, so assuming that by default the optimizer is set at SGD, then what’s the point of having beta_1 enabled? Similarly, if optimizer is set as Adam, what’s the point of having nesterov enabled? Or am I understanding it incorrectly?

And why doesn’t AutoML search through optimizer_config.momentum, optimizer_config.decay and reg_config.weight_decay in v5.0? It did until v4.0.2. Is it not very helpful?

The csv file is missing the LR config parameters. For the current parameters in csv file for optimizer, users can enable/disable them based on different optimizer.

BTW, for workaround, since we open source the repo, it is also possible for users to decide what they want to enable. Users can modify csv files, and then build a new container.
https://github.com/NVIDIA/tao_front_end_services. User can view the target docker_build in the Makefile for just docker build.

Do you know when the spec file for 5.0 will be updated?

For the spec file for 5.0, you can download latest notebook.
Refer to TAO Toolkit Quick Start Guide - NVIDIA Docs

  wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/tao-getting-started/versions/5.0.0/zip -O getting_started_v5.0.0.zip
  unzip -u getting_started_v5.0.0.zip  -d ./getting_started_v5.0.0 && rm -rf getting_started_v5.0.0.zip && cd ./getting_started_v5.0.0

or TAO Toolkit Getting Started | NVIDIA NGC

The 5.0 user guide is also already available.Object Detection - NVIDIA Docs

It says v4.0.1, doesn’t have information about Image Classification using pytorch, and neither does it have explanation for the new choices for hyperparameters for AutoML.

AutoML parameters aren’t dependent on the notebook right, unless changes are made to the Github here, it wouldn’t let users use the LR config parameters for AutoML right? So my question was regarding that. When do you reckon documentation for this and the AutoML fix will come in?

Please try again. We fix the issue and it is 5.0.0 now.

Yes, for current version, the LR config is missing. User can use the workaround mentioned above to build a new container. We will fix it in next release.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.