Custom training PeopleNet model

pbcorrea · May 2, 2020, 9:53pm

Hello,

I’m trying to retrain a PeopleNet model, using custom data. I followed the guidelines given in https://devblogs.nvidia.com/training-custom-pretrained-models-using-tlt/, but can’t get to run the training process.

Here’s the command used for training:

tlt-train detectnet_v2 -e configs/train_spec_peoplenet_v2.txt -r experiments/peoplenet/hybrid_v1 -k nvidia-tlt --gpus 2

And the output log received. As I understand from the logs, the decryption of the model (pretrained by Nvidia) can’t be done with my own key. Do I have to provide a specific key to load the model?

2020-05-02 21:47:02,541 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,545 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at configs/train_spec_peoplenet_v2.txt.
2020-05-02 21:47:02,546 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,700 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
2020-05-02 21:47:02,705 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 47, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 667, in main
    
  File "./detectnet_v2/scripts/train.py", line 591, in run_experiment
    
  File "./detectnet_v2/scripts/train.py", line 476, in train_gridbox
    
  File "./detectnet_v2/scripts/train.py", line 286, in build_gridbox_model
    
  File "./detectnet_v2/model/detectnet_model.py", line 108, in construct_model
  File "./detectnet_v2/model/utilities.py", line 100, in model_io
  File "./common/utils.py", line 245, in decode_to_keras
    
IOError: Invalid decryption. Unable to open file (File signature not found). The key used to load the model is incorrect.
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[616,1],1]
  Exit code:    1```

Morganh · May 3, 2020, 3:40am

Please replace below

-k nvidia-tlt

with your own ngc key and retry.

pbcorrea · May 3, 2020, 4:07am

Hi Morganh! Thanks for the quick answer.

I had a typo in my previous training command. I’m already using my key:

tlt-train detectnet_v2 -e configs/train_spec_peoplenet_v2.txt -r experiments/peoplenet/hybrid_v1 -k $KEY --gpus 2

However the same error keeps appearing. Could it be that the model has been pretrained with another key, to which I don’t have access?

Morganh · May 3, 2020, 4:19am

Hi pbcorrea,
My previous comment is not correct.

See https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet

Please use below key and retry.

Model load key: tlt_encode

pbcorrea · May 3, 2020, 4:22am

Great. I missed that part when downloading the model.
Thanks!

Topic		Replies	Views
Errors while running inference on Peoplenet Model (no training) TAO Toolkit	3	452	December 28, 2021
TLT v3 fails to train PeopleNet model from ngc TAO Toolkit	5	613	August 29, 2021
Peoplenet model training not getting started TAO Toolkit	2	302	April 5, 2024
Unable to open file (File signature not found) TAO Toolkit	9	2008	October 12, 2021
The problem of the key used to load the model is incorrect TAO Toolkit	5	585	April 13, 2023
No detections after training PeopleNet using custom labeled data TAO Toolkit	7	867	October 12, 2021
Tlt 3.0 retrained vehicletypenet, classification net error when loaded pretrained model TAO Toolkit	4	403	October 12, 2021
Error while trying to train new data with LPRnet transfer learning TAO Toolkit	2	911	March 15, 2022
Pretrained model file not found TAO Toolkit	2	462	October 12, 2021
When prune is executed, "OSError: Invalid decryption. Unable to open file (file signature not found). " occurs TAO Toolkit	18	695	April 27, 2022

Custom training PeopleNet model

Related topics