I first tried to train a resnet18 model with publicly available kitti dataset, to get an idea of how the whole TLT works. Though i had to write small scripts to resize all the images to same resolution along side rescaling their bounding boxes accordingly otherwise the training was throwing errors after few epochs.
I successfully trained the model on DGX-1 and to deploying it to Agx-Xavier.
The training went smoothly as all the images have similar resolution so it was very easy to resize them to an uniform resolution in multiples of 16.
Then comes the real part, now i have to train a model on custom dataset (all the required labels in kitti format are already prepared).
Now i also have good experience on building all the required spec files to train a model successfully.
But here’s the issue, the resolutions of the images have very large differences between them i.e., with different aspect ratio.
I can’t simply resize all the images to same resolution as it will destroy the sensitive parts of the images.
I am sharing all the analytics that i have performed over their resolutions so far:
stats---size(kb)-----aspect_ratio-----height-----width
mean 89.12 1.205335 323.0 224.0
std 85.11 0.604291 187.0 226.0
min 3.80 0.000000 49.0 27.0
25% 28.42 1.000000 175.0 99.0
50% 61.94 1.000000 272.0 154.0
75% 118.75 2.000000 427.0 247.0
max 823.51 6.000000 1122.0 1920.0
Histogram: height vs. width
https://66.media.tumblr.com/d07be29c4bf5ff6778aa6ea2ccf59a18/299825b9a9864a17-f1/s2048x3072/0832a6ef06cfc8b125e218b8fabbc39b6ad2328f.png
Can anyone please suggest a good way to process these images for the training. Such that it don’t destroy any sensitive information.