In this webinar slide I have found out that for TLT transfer learning each frame in Stanford Drone Dataset should be converted to the size 768x768 however Kitti dataset includes images with resolutions 1392x512. Do I need to explicitly convert frames to 768x768 before training or does TLT converts them automatically before training?
What’s more, I have found a github repo which converts Stanford Drone Dataset Videos to Frames and also converts annotations to Kitti format. But there are some oddities such as frame-annotation anomalies. First anomaly is the +1 annotation amount;
images => bookstore/video0 => 13334 frames --- annotations => 13335 annotations, images => bookstore/video2 => 14557 frames --- annotations => 14558 annotations, images => bookstore/video3 => 14557 frames --- annotations => 14558 annotations
images => nexus/video5 => 1061 frames --- annotations => nexus/video5 => 562 annotations
Is there an official Dataset format converter script for this specific case?
Thanks in advance :)