Custom training PeopleNet model

Hello,

I’m trying to retrain a PeopleNet model, using custom data. I followed the guidelines given in https://devblogs.nvidia.com/training-custom-pretrained-models-using-tlt/, but can’t get to run the training process.

Here’s the command used for training:

tlt-train detectnet_v2 -e configs/train_spec_peoplenet_v2.txt -r experiments/peoplenet/hybrid_v1 -k nvidia-tlt --gpus 2

And the output log received. As I understand from the logs, the decryption of the model (pretrained by Nvidia) can’t be done with my own key. Do I have to provide a specific key to load the model?

2020-05-02 21:47:02,541 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,545 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at configs/train_spec_peoplenet_v2.txt.
2020-05-02 21:47:02,546 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,700 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
2020-05-02 21:47:02,705 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 47, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 667, in main
    
  File "./detectnet_v2/scripts/train.py", line 591, in run_experiment
    
  File "./detectnet_v2/scripts/train.py", line 476, in train_gridbox
    
  File "./detectnet_v2/scripts/train.py", line 286, in build_gridbox_model
    
  File "./detectnet_v2/model/detectnet_model.py", line 108, in construct_model
  File "./detectnet_v2/model/utilities.py", line 100, in model_io
  File "./common/utils.py", line 245, in decode_to_keras
    
IOError: Invalid decryption. Unable to open file (File signature not found). The key used to load the model is incorrect.
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[616,1],1]
  Exit code:    1```

Please replace below

-k nvidia-tlt

with your own ngc key and retry.

Hi Morganh! Thanks for the quick answer.

I had a typo in my previous training command. I’m already using my key:

tlt-train detectnet_v2 -e configs/train_spec_peoplenet_v2.txt -r experiments/peoplenet/hybrid_v1 -k $KEY --gpus 2

However the same error keeps appearing. Could it be that the model has been pretrained with another key, to which I don’t have access?

Hi pbcorrea,
My previous comment is not correct.

See https://ngc.nvidia.com/catalog/models/nvidia:tlt_peoplenet

Please use below key and retry.

Model load key: tlt_encode

Great. I missed that part when downloading the model.
Thanks!