With reference to DetectNet_v2 — Transfer Learning Toolkit 3.0 documentation (nvidia.com) and the sample Jupyter notebook/dataset
Please could I get some guidance on how to train a model using my data.
The image resolution and aspect ratio in the sample is an unusual 1248x384
native image size =1920x1080 (16:9 aspect ratio)
annotations converted to KITTI format, which uses absolute sizes rather than relative or percentage- so I’m assuming if I resize images, I’d need to regenerate KITTI annotations ?
TFRecords generated from the above
Don’t necessarily need to run inferencing at this resolution- but certainly at a 16:9 aspect ratio
I get that the width & height need to be multiples of 16-
i.e width of 1920 is ok, but height of 1080 is not (1080/16=67.5)
Does this mean that:
I just change the training augmentation_config to the next multiple i.e output_image_width=1920, output_image_height = 1088 ?? It looks like the images get padded
I need to resize images & annotations to multiples of 16- i.e 1280x720 is the correct aspect ratio and also meets the 16 rule ??
or some other option ??
I think this would be needed by other people as well- so a sample dataset or training config would be fantastic
For now, my training config is the next nearest size divisible by 16 and the training seems to be running ok- but not sure if this is the best approach, or will mean I have to inference at this resolution