This is 66% of the figures published in the blog (link in my original post) which is same result as for the 100k iterations. Sounds like going to 400k from 100k iterations because GPU were reduced from 8 to 2 might not be the best course of action. If results converged in blog on 100k iterations (8 GPU setup) then it should converged in 100k in my setup as well (2 GPU setup). The only difference should be if blog’s training finished in 4 hours then mine will finish in 16 hours.

Changes in init_learning_rate from 0.02 (8 GPU) to 0.005 (2 GPU) made sense though but sadly didn’t have much impact on the outcome.

Could you please collect and provide feedback on how the results in blog can be reproduced? Thanks.

Regards,
Ghazni

PS: listing only AP values from log.txt for 40 evaluation rounds (after each 10000 iterations)

Update: After finetune the blog’s warmup_steps from 0 to 1000, the AP is 0.31344375, which is closed to blog’s AP( 0.334154785 ).
Note that, I am using 8gpus(v100) training.

Thank you. I’ll do another train-run on this and use this spec. The only difference now is “init_learning_rate: 0.005” because in my setup there are 2 GPUs

Hope I understand you correctly now that you are getting 0.30+ results in 8 GPUs (v100) setup and presumably on single class problem but haven’t yet tried it with 2 GPUs.

End of the day these are calculations so accuracy should not depend on number of GPUs. Yes time to perform those calculation would definitely change and will be 4 times more which is understandable.

Ok I’ll wait for your comments. Thanks again for looking into this.

For 2gpus, please try to trigger training as below spec. Per the latest result from Nvidia internal team, training with 2 gpus(V100), the AP can get 33.2 in the end.

Could you please share the logs produced by tlt-train in internal run? Training in my setup for this spec is going to take about a week (6-7 days) so logs will help me in understanding/aligning the convergence. Thank you.