Hello,
I’m trying to retrain a PeopleNet model, using custom data. I followed the guidelines given in https://devblogs.nvidia.com/training-custom-pretrained-models-using-tlt/, but can’t get to run the training process.
Here’s the command used for training:
tlt-train detectnet_v2 -e configs/train_spec_peoplenet_v2.txt -r experiments/peoplenet/hybrid_v1 -k nvidia-tlt --gpus 2
And the output log received. As I understand from the logs, the decryption of the model (pretrained by Nvidia) can’t be done with my own key. Do I have to provide a specific key to load the model?
2020-05-02 21:47:02,541 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,545 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at configs/train_spec_peoplenet_v2.txt.
2020-05-02 21:47:02,546 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from configs/train_spec_peoplenet_v2.txt
2020-05-02 21:47:02,700 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
2020-05-02 21:47:02,705 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 12143 samples with a batch size of 32; each epoch will therefore take one extra step.
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 47, in main
File "<decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/train.py", line 667, in main
File "./detectnet_v2/scripts/train.py", line 591, in run_experiment
File "./detectnet_v2/scripts/train.py", line 476, in train_gridbox
File "./detectnet_v2/scripts/train.py", line 286, in build_gridbox_model
File "./detectnet_v2/model/detectnet_model.py", line 108, in construct_model
File "./detectnet_v2/model/utilities.py", line 100, in model_io
File "./common/utils.py", line 245, in decode_to_keras
IOError: Invalid decryption. Unable to open file (File signature not found). The key used to load the model is incorrect.
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[616,1],1]
Exit code: 1```