Model training early stop

rajaneconsys · June 25, 2020, 3:35am

Hi all
I am training peoplenet model with my custom dataset. while training i am getting same loss from last 3 epochs. So, how to implement early stop callback kind of thing so that if loss is not improving training will automatically stop.

Morganh · June 25, 2020, 2:32pm

For detectnet_v2 network, user can set “checkpoint interval”. It means the the interval (in epochs) at which tlt-train saves intermediate models.
End user can stop training by ctrl+c, and find the models in the output folder.

rajaneconsys · June 26, 2020, 9:29am

Thanks @Morganh for reply.

rajaneconsys · June 26, 2020, 12:17pm

Hi @Morganh
I successfully completed training on my custom dataset and when i did evaluate i got the following average precision:

Matching predictions to ground truth, class 1/1.: 100%|█| 47067/47067 [00:02<00:

Validation cost: 0.001158
Mean average_precision (in %): 54.3145

class name average precision (in %)
------------ --------------------------
person 54.3145

Median Inference Time: 0.024809
2020-06-26 11:52:37,874 [INFO] iva.detectnet_v2.scripts.evaluate: Evaluation complete.
Time taken to run iva.detectnet_v2.scripts.evaluate:main: 0:05:20.913157.

I want to know what this 54.3145 average precison means? is this average precision with 0.5 iou or with something else.

Morganh · June 26, 2020, 3:40pm

See “minimum_detection_ground_truth_overlap” in your spec.
Minimum IOU between ground truth and predicted box after clustering to call a valid detection.

rajaneconsys · June 26, 2020, 5:31pm

Thanks @Morganh.
But one more question is, I evaluate on hold out dataset which i did not involve for testing for that dataset i am getting 0 average precision and the resolution of images in hold out dataset are 19201080 and my training dataset images having 960960 resolution.
so i have to change the resolution of my hold out dataset before evaluation?

Morganh · June 28, 2020, 9:32am

Yes.
But I think you can try tlt-infer firstly.

Topic		Replies	Views
Resume training from saved model TAO Toolkit	2	621	October 12, 2021
Tlt-train always errors on No such file or directory: 'trained/model.step-0.ckzip' TAO Toolkit	3	1189	October 12, 2021
KeyError: 'customer' .....when running tlt-train TAO Toolkit	3	848	October 12, 2021
Tao Training Detectnet_v2 custom dataset : Average precision value 0.0000% TAO Toolkit	5	212	June 25, 2024
TLT learning, validation and training details TAO Toolkit	4	504	September 6, 2021
Error while traininig detectnet_v2 with mobilenet_v2 backbone TAO Toolkit	6	640	October 12, 2021
Mean average precision too low on dimension (640*480) with (detectnetv2+Resnet18)? TAO Toolkit	2	733	October 12, 2021
tlt-train detectnet V2 mean average precision always 0 % in every target class TAO Toolkit	5	1060	October 12, 2021
Classification checkpoints TAO Toolkit	10	548	July 1, 2022
Using detectnet_v2 pretrained models in TLT v3.0 TAO Toolkit	11	921	October 27, 2021

Model training early stop

Related topics