Wrong Offline Data Augmentation Documentation

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) A4000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) N/A
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) toolkit_version: 5.2.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

As the latest Tao doesn’t support rotation/shear online augmentation, I will have to do offline augmentation. There are two issues:

  1. this 5.2 linked offline augmentation is using a deprecated example command “tao augment” : Offline Data Augmentation - NVIDIA Docs. It took me a while to figure out that tao dataset augmentation generate is the corrected one. Please update this

  2. even with tao dataset augmentation generate, if I do tao dataset augmentation generate --help, the args are very different from what you have put here: Offline Data Augmentation - NVIDIA Docs. Again, this is very misleading

  3. after spending many hours, I figured a working command but cannot get the yaml parsed correctly.

Below is what I have tried but didn’t work:

spatial_aug:
  rotation:
    angle: 5
    units: degrees
  shear:
    shear_ratio_x: 0.3
data:
  dataset_type: coco
  image_dir: /workspace/tao/data/images
  anno_path: /workspace/tao/data/output.json
  output_dataset: /workspace/tao/data/out
  batch_size: 8
  include_masks: false

It throws weird errors:

2024-01-04 02:35:24,610 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-01-04 02:35:24,691 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-data-services
2024-01-04 02:35:25,298 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
sys:1: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
sys:1: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_ds/core/hydra/hydra_runner.py:105: UserWarning: 
'offline_data_augment.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  _run_hydra(
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
    full_key: spatial_aug.rotation.angle
    reference_type=RotationConfig
    object_type=RotationConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error merging 'offline_data_augment.yaml' with schema
Invalid value assigned: AnyNode is not a ListConfig, list or tuple.
    full_key: spatial_aug.rotation.angle
    reference_type=RotationConfig
    object_type=RotationConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[740,1],0]
  Exit code:    1
--------------------------------------------------------------------------
Sending telemetry data.
Execution status: FAIL
2024-01-04 02:35:31,674 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Thanks for the catching. The info is not updated for TAO5.x. Will update it. Currently, it needs tao dataset to launch. You can also run docker run nvcr.io/nvidia/tao/tao-toolkit:5.2.0-data-services to login the docker.
Other info can be found in TAO Toolkit Launcher - NVIDIA Docs.

The info is not updated for TAO5.x. The help info is as below.

augmentation -h
usage: augmentation [-h] -e EXPERIMENT_SPEC_FILE [--gpu_ids GPU_IDS] [--num_gpus NUM_GPUS] [-o OUTPUT_SPECS_DIR] [--mpirun_arg MPIRUN_ARG] [--launch_cuda_blocking] {generate}

Its source code is tao_dataset_suite/nvidia_tao_ds/augment/entrypoint/augment.py at main · NVIDIA/tao_dataset_suite · GitHub.

The angle needs to be a list according to tao_dataset_suite/nvidia_tao_ds/augment/config/default_config.py at main · NVIDIA/tao_dataset_suite · GitHub. Please change angle and retry.
You can refer to tao_tutorials/notebooks/tao_data_services/specs/augment.yaml at main · NVIDIA/tao_tutorials · GitHub

More info can refer to
tao_tutorials/notebooks/tao_data_services/kitti.ipynb at main · NVIDIA/tao_tutorials · GitHub.

Thanks, I will give it a try.

Another questions:

  • it is quite inconvenient to do rotation/shear augmentation offline, and do you have any plan to make it part of online operations ?
  • there are other very useful data augmentations, such as copy-paste. Any plan to add support to that ?
  • Is that part of the code open source any where ? any chance the community can contribute ?

All the code are open source. You can find in the bottom of NVIDIA Corporation · GitHub. You may find corresponding docker in TAO Toolkit | NVIDIA NGC.
Online augmentation are already in the network.

1 Like

Thanks, this is very helpful.

If I would like to use my own fork of GitHub - NVIDIA/tao_tensorflow1_backend: TAO Toolkit deep learning networks with TensorFlow 1.x backend, with some customized enhancements, would I be able to build it my own and use the taokit to wrap it ? Any documentations about this ?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes, you can.
You can login the docker, then modify the file. Last, run docker commit to save your custom docker.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.