Troubles Replicating TLT Model Training Experiment with TAO

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For tfrecord generation in TAO, there are some difference mentioned in
https://github.com/NVIDIA/tao_tensorflow1_backend/blob/c7a3926ddddf3911842e057620bceb45bb5303cc/nvidia_tao_tf1/cv/detectnet_v2/dataloader/build_dataloader.py#L251-L276.

Do you mean the pretrained model? For peoplenet model you are using, see
PeopleNet | NVIDIA NGC, there are different version of unpruned models.

Actually for detectnet_v2 network, there are not much changes in training config. In TAO, there is “enable_auto _resize”. You can set it to true. More info can be found in DetectNet_v2 - NVIDIA Docs.
It is a flag to enable automatic resize during training. When it is set to True, offline resize before the training is no longer required. Enabling this will potentially increase the training time.

To narrow down, I suggest you to run with KITTI dataset mentioned in the detectnet_v2 notebook to check if there is still the same behavior. For KITTI dataset, all the images are 1248x384.
For TAO, the spec is https://github.com/NVIDIA/tao_tutorials/blob/95aca39c79cb9068593a6a9c3dcc7a509f4ad786/notebooks/tao_launcher_starter_kit/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.