Question about transfer learning to PeopleNet

hi i’m going to make dataset for transfer learning to PeopleNet

i saw in the document nvidia internal dataset of PeopleNet were made 960x544 resolutions images

so i wonder, when i make the dataset are must be 960x544 resolution?

for example

  1. a ‘people image’ [600x400] must be resizing to [960x544] ?

like below

  1. a lot of small face images [200x200] can composite to an image? [ 960 x 544 ]

like below

I would like to get some advice before making dataset.

thanks

  1. It is not a must to be 960x544. But need to meet

DetectNet_v2

  • Input size: C * W * H (where C = 1 or 3, W > =480, H >=272 and W, H are multiples of 16)
  • Image format: JPG, JPEG, PNG
  1. You can composite.

hi @Morganh

i have one more question

should input image size be unified into one resolution?

ex) all images must be are same resolution[if 600x400]? or can mix each other resolution [600x400 & 960x544]?

Should be unified into one resolution.
See below in tlt user guide.

Note: The tlt-train tool does not support training on images of multiple resolutions, or resizing images during training. All of the images must be resized offline to the final training size and the corresponding bounding boxes must be scaled accordingly.

1 Like

thank you Morganh !