Question about augmentation section of training configurations

In the preprocessing part of augmentation section, there are cropping options like crop_right, crop_bottom etc, which could take values from 0-input image width/height according to the tlt document, however, when I tried cropping sizes different from output_image_width/height, I always got error information like this:

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(64, 128, 26, 26), (64, 256, 25, 25)]

Therefore, I’m confused bout this cropping size, do they necessarily have the same size as output_image_width/height?

Which network are you training?


Please note that for yolo_v3, according to the tlt user guide, the width/height should be multiples of 32

Even for the crop_size? In the documentation, it’s mentioned that the augmentation module will finally crop or pad the image to fit the output_image_size, in my opinion, the crop_size doesn’t influence what’s given to the network. And for yolo, I always set 416 * 416 as output_image_size.

Currently, if crop_size is not multiples of 32, there is an error as you mentioned.

Ok, thanks for your clarification! As for other network, such as RetinaNet, do we have any specific constraints on those sizes?

RetinaNet needs this constraint.