Question about augmentation section of training configurations

Lapino · November 10, 2020, 8:43am

In the preprocessing part of augmentation section, there are cropping options like crop_right, crop_bottom etc, which could take values from 0-input image width/height according to the tlt document, however, when I tried cropping sizes different from output_image_width/height, I always got error information like this:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(64, 128, 26, 26), (64, 256, 25, 25)]

Therefore, I’m confused bout this cropping size, do they necessarily have the same size as output_image_width/height?

Morganh · November 10, 2020, 9:22am

Which network are you training?

Lapino · November 10, 2020, 9:25am

yolov3

Morganh · November 10, 2020, 9:44am

Please note that for yolo_v3, according to the tlt user guide, the width/height should be multiples of 32

Lapino · November 10, 2020, 9:56am

Even for the crop_size? In the documentation, it’s mentioned that the augmentation module will finally crop or pad the image to fit the output_image_size, in my opinion, the crop_size doesn’t influence what’s given to the network. And for yolo, I always set 416 * 416 as output_image_size.

Morganh · November 11, 2020, 10:00am

Currently, if crop_size is not multiples of 32, there is an error as you mentioned.

Lapino · November 11, 2020, 11:06am

Ok, thanks for your clarification! As for other network, such as RetinaNet, do we have any specific constraints on those sizes?

Morganh · November 12, 2020, 9:09am

RetinaNet needs this constraint.