Wrong Offline Data Augmentation Documentation

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) A4000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) N/A
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) toolkit_version: 5.2.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

As the latest Tao doesn’t support rotation/shear online augmentation, I will have to do offline augmentation. There are two issues:

  1. this 5.2 linked offline augmentation is using a deprecated example command “tao augment” : Offline Data Augmentation - NVIDIA Docs. It took me a while to figure out that tao dataset augmentation generate is the corrected one. Please update this

  2. even with tao dataset augmentation generate, if I do tao dataset augmentation generate --help, the args are very different from what you have put here: Offline Data Augmentation - NVIDIA Docs. Again, this is very misleading

  3. after spending many hours, I figured a working command but cannot get the yaml parsed correctly.

Below is what I have tried but didn’t work:

spatial_aug:
  rotation:
    angle: 5
    units: degrees
  shear:
    shear_ratio_x: 0.3
data:
  dataset_type: coco
  image_dir: /workspace/tao/data/images
  anno_path: /workspace/tao/data/output.json
  output_dataset: /workspace/tao/data/out
  batch_size: 8
  include_masks: false

It throws weird errors:

2024-01-04 02:35:24,610 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-01-04 02:35:24,691 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-data-services
2024-01-04 02:35:25,298 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
sys:1: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
sys:1: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
    full_key: spatial_aug.rotation.angle
    reference_type=RotationConfig
    object_type=RotationConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
    full_key: spatial_aug.rotation.angle
    reference_type=RotationConfig
    object_type=RotationConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[740,1],0]
  Exit code:    1
--------------------------------------------------------------------------
Sending telemetry data.
Execution status: FAIL
2024-01-04 02:35:31,674 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Thanks for the catching. The info is not updated for TAO5.x. Will update it. Currently, it needs tao dataset to launch. You can also run docker run nvcr.io/nvidia/tao/tao-toolkit:5.2.0-data-services to login the docker.
Other info can be found in TAO Launcher — Tao Toolkit.

The info is not updated for TAO5.x. The help info is as below.

augmentation -h
usage: augmentation [-h] -e EXPERIMENT_SPEC_FILE [--gpu_ids GPU_IDS] [--num_gpus NUM_GPUS] [-o OUTPUT_SPECS_DIR] [--mpirun_arg MPIRUN_ARG] [--launch_cuda_blocking] {generate}

Its source code is https://github.com/NVIDIA/tao_dataset_suite/blob/main/nvidia_tao_ds/augment/entrypoint/augment.py.

The angle needs to be a list according to https://github.com/NVIDIA/tao_dataset_suite/blob/main/nvidia_tao_ds/augment/config/default_config.py#L34. Please change angle and retry.
You can refer to https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_data_services/specs/augment.yaml

More info can refer to
https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_data_services/kitti.ipynb.

Thanks, I will give it a try.

Another questions:

  • it is quite inconvenient to do rotation/shear augmentation offline, and do you have any plan to make it part of online operations ?
  • there are other very useful data augmentations, such as copy-paste. Any plan to add support to that ?
  • Is that part of the code open source any where ? any chance the community can contribute ?

All the code are open source. You can find in the bottom of NVIDIA Corporation · GitHub. You may find corresponding docker in TAO Toolkit | NVIDIA NGC.
Online augmentation are already in the network.

Thanks, this is very helpful.

If I would like to use my own fork of GitHub - NVIDIA/tao_tensorflow1_backend: TAO Toolkit deep learning networks with TensorFlow 1.x backend, with some customized enhancements, would I be able to build it my own and use the taokit to wrap it ? Any documentations about this ?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes, you can.
You can login the docker, then modify the file. Last, run docker commit to save your custom docker.