LPRnet saving checkpoint on all epochs (huge disk consumption)

peubouzon · August 4, 2021, 6:47pm

• Hardware: (GTX 1660)
• Network Type: LPRnet
• TLT Version: docker_tag: v3.0-py3
• Training spec file: tutorial_spec.txt (1.3 KB)

I noticed that some models accept the checkpoint_interval parameter on the spec file but nothing about it is mentioned on the lprnet documentation .

Does lprnet accept this parameter? I’m trying to use it but TLT still saves on all epochs.

Morganh · August 5, 2021, 2:56am

LPRnet does not support saving intermediate models yet.
For workaround, please write a script to delete the models with an interval.

peubouzon · August 5, 2021, 10:16pm

I’m able to save the intermediate models with the provided code. Can you clarify your statement?

That’s a nice idea but can you please confirm if the checkpoint_interval can be used or if there is something similar?

Morganh · August 6, 2021, 12:39am

Could you share how did you run?

peubouzon · August 10, 2021, 6:26pm

I’m using the notebook provided in TLT Quick Start Guide, specifically the one named lprnet.ipynb. In this notebook, the default behavior is to save intermediate models for every epoch.

Morganh · August 11, 2021, 1:23am

OK, yes, lprnet can save all the models for every epoch. But LPRnet does not support saving models with an interval yet.