Optimal width and height of the images

What is the optimal width and height of the images for the following models?
object detections: yolo_v3, yolo_v4, retinanet, ssd and faster_rcnn
segmentations: unet and mask_rcnn
classifications

Do you mean the resolution of the training images or the input_size of the model you want to train?

please tell me about both of them

For input_size of the model

Faster_rcnn : see FasterRCNN — TAO Toolkit 3.22.05 documentation

  • Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128)

yolo_v3 or yolo_v4: see YOLOv3 — TAO Toolkit 3.22.05 documentation

  • Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

ssd: see SSD — TAO Toolkit 3.22.05 documentation

  • Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128)

retinanet: see RetinaNet — TAO Toolkit 3.22.05 documentation

  • Input size : C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

Mask_rcnn: see https://docs.nvidia.com/tao/tao-toolkit/text/instance_segmentation/mask_rcnn.html#input-requirement

  • Input size : C * W * H (where C = 3, W >= 128, H >= 128 and W, H are multiples of 2^ max_level )

Unet: see UNET — TAO Toolkit 3.22.05 documentation

  • Input size : C * W * H (where C = 3 or 1, W = 572, H = 572 for vanilla unet and W >= 128, H >= 128 and W, H are multiples of 32 for other archs).

For resolution of the training image

Faster_rcnn: see FasterRCNN — TAO Toolkit 3.22.05 documentation , with static input shape, we can offline resize the images to the target resolution or we can enable automatic resize during training.

yolo_v3, yolo_v4, retinanet, ssd : do not need to resize images/labels. There is automatic resizing during training.

Mask_rcnn or Unet: The images and masks need not be equal to model input size. The images/ masks will be resized to the model input size during training.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.