Experiment Spec File: meaning of zoom_min and zoom_max

I am creating and experiment spec file for training an object detection model using tlt. However, I do not fully understand some of the parameters. What exactly means zoom_min and zoom_max?

I think that values above 1 do cropping (just like tf.image.crop_and_resize), and values below 1 do padding, but it’s no clear enough for me.

For instance, using the following configuration, do I enlarge or shrink the bounding boxes?

augmentation_config {
  preprocessing {
    output_image_width: 960
    output_image_height: 544
    min_bbox_width: 1.0
    min_bbox_height: 1.0
    output_image_channel: 3
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 1.0
    zoom_max: 2.0
    translate_max_x: 8.0
    translate_max_y: 8.0
  }

Below are the description.

zoom_min/max (float): Minimum/maximum zoom ratios. Set min = max = 1 to keep original size.
translate_max_x/y (float): Maximum translation along the x/y axis in pixel values.

Zoom operations are random.
Zoom operations stretch from (0,0) toward output_image_width/output_image_height
If zoom is randomly set to below 1, you can consider it as ‘zooming out’ (image gets rendered smaller than the canvas).
If zoom is randomly set to above 1, you can consider it as ‘zooming in’ (image gets rendered bigger than the canvas).

Thank you @Morganh, now it’s clear for me. However, I have another question related to your image, is the zoomed out image always extracted from the top left corner? do I need to set large values of translate_max_x/y in order to get crops of other regions of the image?

Yes, Zoom operations stretch from (0,0) toward output_image_width/output_image_height.

For translate operations, they are also random. And please keep mind that translate and zoom are mutually independent.

The order of the spatial augmentation is:
Crop–>flip(if any)–> rotate (if any) → zoom (if any) -->translate (if any) -->crop to output_image_width/output_image_height

The first crop is the one defined with “crop_right” and “crop_bottom”. The second another crop is the one which will get the image to output dimensions.

Great! I just have one more question… Is the first crop fixed to 0:crop_bottom, 0:crop_right or is it taken random values between 0 and crop_bottom/crop_right? I mean, is it an augmentation or a global cropping of the dataset?

The first crop operation is a preprocessing operation to extract the region of interest that you would like to train on from your image in the dataset. We allow you to set crop_left, crop_top, crop_right, crop_bottom to define the left, right, top and bottom edges of the cropped ROI in the preprocessing section of the augmentation_config.
More info in Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

Perfect, thank you very match, now it is resolved.

@Morganh @motyaedu

I see that the docs mention -

The net tensors generated from the pre-processing blocks are then passed through a pipeline of random augmentations in spatial and color domains.

I have a few questions regarding augmentation_config:

Does this augmentation just modify original input images or add new images after applying specified operations?
Are all enabled spatial augmentation and color augmentation operations applied at once sequentially for every single image?

  1. Just apply specified operations on original input images.
  2. The spatial and color transformation matrices are computed per image.