Network Image Input Resizing

I have noticed in the documentation and in previous forum posts that for detection networks it is required resize the images prior to running training on the network. However, I have previously not been doing this and the network has been operating fine with fairly high accuracy. I also recently resized the images offline and fed them into the network at the input size given in the network spec and did not notice a marginal difference between results. I was wondering if it is still required for the images to be resized offline to match the network input size and, if so, what exactly is happening when I feed in images of differing sizes into the network and how it is handling this.

Which detection network did you train?
And what is the average resolution of original training image? Are the almost the same resolution?

We are using FasterRCNN. The original images are 1920 by 1080, downsampled to 960 by 540.

If all your original images are 1920x1080, you do not need to resize.

See more in https://docs.nvidia.com/metropolis/TLT/archive/tlt-20/tlt-user-guide/text/supported_model_architectures.html#fasterrcnn

FasterRCNN

  • Input size : C * W * H (where C = 1 or 3; W > =160; H >=160)
  • Image format : JPG, JPEG, PNG
  • Label format : KITTI detection

Note

The tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

What happens then when I use the 1920 by 1080 images on a network with an input of 960 by 540?

See Creating an Experiment Spec File — Transfer Learning Toolkit 2.0 documentation

If the output image height and the output image width of the preprocessing block doesn’t match with the dimensions of the input image, the dataloader either pads with zeros, or crops to fit to the output resolution. It does not resize the input images and labels to fit.