Input Shape for Yolov4 Model

leviethung1280 · June 8, 2021, 8:23am

I’ve trained the License Plate Detection by Yolov4 with Resnet18 as a backbone. According to config file, I saw that the output_width and output_height is 1248 and 384 respectively. I would like to ask whether these values are important when we do the inference ?

Suppose that I have a camera stream at full HD resolution (1920x1080). Should I change the default values in the configure file?

augmentation_config {
  hue: 0.1
  saturation: 1.5
  exposure:1.5
  vertical_flip:0
  horizontal_flip:0.1
  jitter: 0.3
  output_width: 1248
  output_height: 384
  randomize_input_shape_period: 0
  mosaic_prob: 0.5
  mosaic_min_ratio:0.2
}

Morganh · June 8, 2021, 8:34am

The output_width and output_height is not related to inference. It reflects the model size you want to train.

leviethung1280 · June 8, 2021, 9:03am

Sorry, I did not understand the model size term. You mean that this is the input shape for the model when we do the training ? Does it require all the training images have the same shape?

Morganh · June 8, 2021, 9:12am

Yes, the input shape for the model.
For yolo_v4, see Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation, it does not require all the training images of the same shape.