I plan to train a yolo v4 model using TAO toolkit, but I have questions regarding the preprocessing of my images.
-
I will use my model on an hd video stream (1920x1080), is it still ok to have the input dim of the model smaller to have shorter train time ? (so for example If I train the model with 1364x768 images).
-
If the input dim of my model has a different aspect ratio than 16/9, when I will infer on the hd video, will the model see distorted images ? Can this affect performances (my understanding is that it will)
-
I know that with yolov4, TAO resize all input images during the augmentation process to be the equal to the input dim of the model (distorting the image if necessary). But if I have images with various resolutions and aspect ratios, my understanding is that I should beforehand have all my images have the same resolution as the model input dim, or the very least, the same aspect ratio so that my training images don’t get distorted. Is that correct ?
-
If yes, If I have an image that is originally smaller (or have a dimention smaller) than the input dim, what would be the best thing to do ? upscale the image enouth so that I can crop a portion the size of the input dim (potentially cutting out part of the annotated object), or is it possible to add padding to the image ?
To be fair both feels wrong, is there a better solution ? or should I just not include those images in my dataset ?