Hi Morgan,
following is the spec file:
random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 8
arch: "baseline"
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 32
num_epochs: 24
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-5
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 1
}
augmentation_config {
output_width: 64
output_height: 64
output_channel: 3
max_rotate_degree: 5
rotate_prob: 0.5
gaussian_kernel_size: 5
gaussian_kernel_size: 7
gaussian_kernel_size: 15
blur_prob: 0.5
reverse_color_prob: 0.5
keep_original_prob: 0.3
}
dataset_config {
data_sources: {
label_directory_path: "/workspace/tao-experiments/data/openalpr/train/label"
image_directory_path: "/workspace/tao-experiments/data/openalpr/train/image"
}
characters_list_file: "/workspace/tao-experiments/lprnet/specs/us_lp_characters.txt"
validation_data_sources: {
label_directory_path: "/workspace/tao-experiments/data/openalpr/val/label"
image_directory_path: "/workspace/tao-experiments/data/openalpr/val/image"
}
}
(We have done offline augmentation so even tried to remove this part from the spec file but then also it gives an error)
Error log:
For multi-GPU, change --gpus based on your machine.
2022-03-16 06:06:24,693 [INFO] root: Registry: ['nvcr.io']
2022-03-16 06:06:25,883 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-03-16 06:06:25,924 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-03-16 06:06:40,759 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,244 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,244 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-16 06:06:42,246 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,267 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,268 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-16 06:06:42,269 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,280 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,280 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-16 06:06:42,282 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,294 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2022-03-16 06:06:42,294 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt
2022-03-16 06:06:42,296 [INFO] __main__: Loading pretrained weights. This may take a while...
The shape of this layer does not match original model: lstm
Loading the model as a pruned model.
The shape of this layer does not match original model: lstm
Loading the model as a pruned model.
The shape of this layer does not match original model: lstm
Loading the model as a pruned model.
The shape of this layer does not match original model: lstm
Loading the model as a pruned model.
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2022-03-16 06:07:28,322 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
Initialize optimizer
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2022-03-16 06:07:28,659 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
Initialize optimizer
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2022-03-16 06:07:28,927 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
Initialize optimizer
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2022-03-16 06:07:29,247 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
Initialize optimizer
Model: "lpnet_baseline_18"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
image_input (InputLayer) [(None, 3, 48, 96)] 0
__________________________________________________________________________________________________
tf_op_layer_Sum (TensorFlowOpLa (None, 1, 48, 96) 0 image_input[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 48, 96) 640 tf_op_layer_Sum[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 48, 96) 256 conv1[0][0]
__________________________________________________________________________________________________
re_lu (ReLU) (None, 64, 48, 96) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 48, 96) 0 re_lu[0][0]
__________________________________________________________________________________________________
res2a_branch2a (Conv2D) (None, 64, 48, 96) 36928 max_pooling2d[0][0]
__________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_1 (ReLU) (None, 64, 48, 96) 0 bn2a_branch2a[0][0]
__________________________________________________________________________________________________
res2a_branch1 (Conv2D) (None, 64, 48, 96) 4160 max_pooling2d[0][0]
__________________________________________________________________________________________________
res2a_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_1[0][0]
__________________________________________________________________________________________________
bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96) 256 res2a_branch1[0][0]
__________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add (TensorFlowOpLa (None, 64, 48, 96) 0 bn2a_branch1[0][0]
bn2a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_2 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add[0][0]
__________________________________________________________________________________________________
res2b_branch2a (Conv2D) (None, 64, 48, 96) 36928 re_lu_2[0][0]
__________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_3 (ReLU) (None, 64, 48, 96) 0 bn2b_branch2a[0][0]
__________________________________________________________________________________________________
res2b_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_3[0][0]
__________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp (None, 64, 48, 96) 0 re_lu_2[0][0]
bn2b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_4 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add_1[0][0]
__________________________________________________________________________________________________
res3a_branch2a (Conv2D) (None, 128, 24, 48) 73856 re_lu_4[0][0]
__________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_5 (ReLU) (None, 128, 24, 48) 0 bn3a_branch2a[0][0]
__________________________________________________________________________________________________
res3a_branch1 (Conv2D) (None, 128, 24, 48) 8320 re_lu_4[0][0]
__________________________________________________________________________________________________
res3a_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_5[0][0]
__________________________________________________________________________________________________
bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48) 512 res3a_branch1[0][0]
__________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_2 (TensorFlowOp (None, 128, 24, 48) 0 bn3a_branch1[0][0]
bn3a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_6 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_2[0][0]
__________________________________________________________________________________________________
res3b_branch2a (Conv2D) (None, 128, 24, 48) 147584 re_lu_6[0][0]
__________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_7 (ReLU) (None, 128, 24, 48) 0 bn3b_branch2a[0][0]
__________________________________________________________________________________________________
res3b_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_7[0][0]
__________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_3 (TensorFlowOp (None, 128, 24, 48) 0 re_lu_6[0][0]
bn3b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_8 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_3[0][0]
__________________________________________________________________________________________________
res4a_branch2a (Conv2D) (None, 256, 12, 24) 295168 re_lu_8[0][0]
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_9 (ReLU) (None, 256, 12, 24) 0 bn4a_branch2a[0][0]
__________________________________________________________________________________________________
res4a_branch1 (Conv2D) (None, 256, 12, 24) 33024 re_lu_8[0][0]
__________________________________________________________________________________________________
res4a_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_9[0][0]
__________________________________________________________________________________________________
bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24) 1024 res4a_branch1[0][0]
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_4 (TensorFlowOp (None, 256, 12, 24) 0 bn4a_branch1[0][0]
bn4a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_10 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_4[0][0]
__________________________________________________________________________________________________
res4b_branch2a (Conv2D) (None, 256, 12, 24) 590080 re_lu_10[0][0]
__________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_11 (ReLU) (None, 256, 12, 24) 0 bn4b_branch2a[0][0]
__________________________________________________________________________________________________
res4b_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_11[0][0]
__________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_5 (TensorFlowOp (None, 256, 12, 24) 0 re_lu_10[0][0]
bn4b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_12 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_5[0][0]
__________________________________________________________________________________________________
res5a_branch2a (Conv2D) (None, 300, 12, 24) 691500 re_lu_12[0][0]
__________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_13 (ReLU) (None, 300, 12, 24) 0 bn5a_branch2a[0][0]
__________________________________________________________________________________________________
res5a_branch1 (Conv2D) (None, 300, 12, 24) 77100 re_lu_12[0][0]
__________________________________________________________________________________________________
res5a_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_13[0][0]
__________________________________________________________________________________________________
bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24) 1200 res5a_branch1[0][0]
__________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_6 (TensorFlowOp (None, 300, 12, 24) 0 bn5a_branch1[0][0]
bn5a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_14 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_6[0][0]
__________________________________________________________________________________________________
res5b_branch2a (Conv2D) (None, 300, 12, 24) 810300 re_lu_14[0][0]
__________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_15 (ReLU) (None, 300, 12, 24) 0 bn5b_branch2a[0][0]
__________________________________________________________________________________________________
res5b_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_15[0][0]
__________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_7 (TensorFlowOp (None, 300, 12, 24) 0 re_lu_14[0][0]
bn5b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_16 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_7[0][0]
__________________________________________________________________________________________________
permute_feature (Permute) (None, 24, 12, 300) 0 re_lu_16[0][0]
__________________________________________________________________________________________________
flatten_feature (Reshape) (None, 24, 3600) 0 permute_feature[0][0]
__________________________________________________________________________________________________
lstm (LSTM) (None, 24, 512) 8423424 flatten_feature[0][0]
__________________________________________________________________________________________________
td_dense (TimeDistributed) (None, 24, 36) 18468 lstm[0][0]
__________________________________________________________________________________________________
softmax (Softmax) (None, 24, 36) 0 td_dense[0][0]
==================================================================================================
Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608
__________________________________________________________________________________________________
2022-03-16 06:07:29,293 [INFO] __main__: Number of images in the training dataset: 3406
2022-03-16 06:07:29,293 [INFO] __main__: Number of images in the validation dataset: 306
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 279, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 275, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 200, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 991, in train_on_batch
extract_tensors_from_dataset=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 572, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected image_input to have shape (3, 48, 96) but got array with shape (3, 64, 64)
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 279, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 275, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 200, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 991, in train_on_batch
extract_tensors_from_dataset=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 572, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected image_input to have shape (3, 48, 96) but got array with shape (3, 64, 64)
Epoch 1/24
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 279, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 275, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 200, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 991, in train_on_batch
extract_tensors_from_dataset=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 572, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected image_input to have shape (3, 48, 96) but got array with shape (3, 64, 64)
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 279, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 275, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 200, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 991, in train_on_batch
extract_tensors_from_dataset=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 2471, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_utils.py", line 572, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected image_input to have shape (3, 48, 96) but got array with shape (3, 64, 64)
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[46717,1],1]
Exit code: 1
--------------------------------------------------------------------------
2022-03-16 06:07:32,366 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.