When trying to train LPRNet using AWS cloud-based Nvidia Deeplearning AMI, we are getting following error at “Run Tao Training.”
Following is the command:
print("For multi-GPU, change --gpus based on your machine.")
!tao lprnet train --gpus 4 --gpu_index 0 1 2 3 \
-e $SPECS_DIR/tutorial_spec.txt \
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
-k $KEY \
-m $USER_EXPERIMENT_DIR/pretrained_lprnet_baseline18/lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt
Following is the error log:
Invalid decryption. SavedModel file does not exist at: /tmp/tmpmwjx8tby.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmpg_ovqk33.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmpsgapohj0.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmp1n3zdre5.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Following is the entire log:
For multi-GPU, change --gpus based on your machine.
2022-03-15 13:52:07,494 [INFO] root: Registry: ['nvcr.io']
2022-03-15 13:52:07,557 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-15 13:52:07,564 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-15 13:52:13,735 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-15 13:52:13,735 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-15 13:52:13,736 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,519 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,519 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,519 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-15 13:52:14,519 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-15 13:52:14,521 [INFO] __main__: Loading pretrained weights. This may take a while...
2022-03-15 13:52:14,521 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,551 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,551 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-15 13:52:14,552 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-15 13:52:14,552 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-15 13:52:14,554 [INFO] __main__: Loading pretrained weights. This may take a while...
2022-03-15 13:52:14,554 [INFO] __main__: Loading pretrained weights. This may take a while...
Invalid decryption. SavedModel file does not exist at: /tmp/tmpmwjx8tby.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmpg_ovqk33.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmpsgapohj0.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
Invalid decryption. SavedModel file does not exist at: /tmp/tmp1n3zdre5.hdf5/{saved_model.pbtxt|saved_model.pb}. The key used to load the model is incorrect.
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[49159,1],2]
Exit code: 1
--------------------------------------------------------------------------
2022-03-15 13:52:19,485 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.