Ability to augment with random crops?

Playing around with TLT v3, and it’s very promising so far!

One thing that is confusing me is how to generate random crops with the Augmentation Module. I am trying to modify the Faster RCNN and DetectNet_v2 default specs.

Let’s say I have many images of size 1920x1080, but I want to train on random 512x512 crops of the original images. What is being cropped if I set preprocessing.output_image_width and height to 512? Is it just taking a center crop every time? Would I be able to get the behavior I need by adjusting spatial_augmentation.translate_max_x and y to be 1920/2 and 1080/2 respectively?

augmentation_config {
preprocessing {
output_image_width: 512
output_image_height: 512
spatial_augmentation {
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 8.0
translate_max_y: 8.0

For augmentation, please see more details info in DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

The dataloader online augmentation pipeline applies spatial and color-space augmentation transformations in the following order:

  1. The dataloader first performs the pre-processing operations on the input data (image and labels) read from the tfrecords files. Here the images and labels are cropped and scaled based on the parameters mentioned in the preprocessing config. The boundaries for generating the cropped image and labels from the original image is defined by the crop_left , crop_right , crop_top and crop_bottom parameters. This cropped data is then scaled by the scale factors defined by scale_height and scale_width . The transformation matrices for these operations are computed globally and do not change per image.
  2. The net tensors generated from the pre-processing blocks are then passed through a pipeline of random augmentations in spatial and color domains. The spatial augmentations are applied to both images and label coordinates, while the color augmentations are applied only to images. To apply color augmentations, the output_image_channel parameter must be set to 3. For monochrome tensors, color augmentations are not applied. The spatial and color transformation matrices are computed per image, based on a uniform distribution along the maximum and minimum ranges defined by the spatial_augmentation and color_augmentation config parameters.
  3. Once the spatial and color augmented net input tensors are generated, the output is then padded with zeros or clipped along the right and bottom edge of the image to fit the output dimensions defined in the preprocessing config.

Yes, when I read that I was confused about the crop transform (in part 1) vs. the spatial_augmentation transforms (in part 2).

The crop parameters are fixed values, not ranges, so how can we apply a different crop location each image? Also, what is the default crop if the values are left unspecified? The upper left corner of the image, the center of the image?

Secondly, the spatial_augmentation in part 2 comes after the crop, doesn’t that lose information from the original image, and the newly empty pixels just get padded with zeros?

Is there any way to preview what the augmentations will output? I see there is an Offline Data Augmentation tool, but the parameters of that don’t match the augmentation available in Faster RCNN or DetectNet2 pipelines. e.g. lack of probabilistic augmentations and crop

The crop parameters are not fixed values. They have a range of value. See DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation

If they are left unspecified, then, crop_left=0, crop_right=image_width, crop_top=0, crop_bottom=image_height

Yes, see above part3, the output is then padded with zeros or clipped along the right and bottom edge of the image to fit the output dimensions defined in the preprocessing config.

There is not a tool for visualization.

More info can be seen in Experiment Spec File: meaning of zoom_min and zoom_max - #4 by Morganh