Poor metric results after retraining maskrcnn using TLT notebook

Morganh · August 20, 2020, 7:02am

@ghazni
After checking, your previous learning rate(0.005) is fine.

But need to enlarge the total_step.
total_steps = total_images * total_epochs / batch_size / nGPUs

Your batchsize=2, the same as blog’s.
Your training gpus is 2, while blog is using 8gpus.
Your total_steps is 100k, while blog’s total_step is 100k.

So, please increase total number of steps to 400k.