Hi,
I am testing lprnet newly added in tlt-v3.0
After following the data preparation steps, I started the training but it is failing.
Error
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2021-02-15 11:08:06,127 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2021-02-15 11:08:06,127 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2021-02-15 11:08:06,297 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2021-02-15 11:08:06,298 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2021-02-15 11:08:06,821 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.
2021-02-15 11:08:06,822 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from specs/tutorial_spec.txt
2021-02-15 11:08:06,826 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2021-02-15 11:09:01,237 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
The shape of this layer does not match original model: td_dense
Loading the model as a pruned model.
Initialize optimizer
Model: "lpnet_baseline_18"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
image_input (InputLayer) [(None, 3, 48, 96)] 0
__________________________________________________________________________________________________
tf_op_layer_Sum (TensorFlowOpLa (None, 1, 48, 96) 0 image_input[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 48, 96) 640 tf_op_layer_Sum[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 48, 96) 256 conv1[0][0]
__________________________________________________________________________________________________
re_lu (ReLU) (None, 64, 48, 96) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 48, 96) 0 re_lu[0][0]
__________________________________________________________________________________________________
res2a_branch2a (Conv2D) (None, 64, 48, 96) 36928 max_pooling2d[0][0]
__________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_1 (ReLU) (None, 64, 48, 96) 0 bn2a_branch2a[0][0]
__________________________________________________________________________________________________
res2a_branch1 (Conv2D) (None, 64, 48, 96) 4160 max_pooling2d[0][0]
__________________________________________________________________________________________________
res2a_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_1[0][0]
__________________________________________________________________________________________________
bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96) 256 res2a_branch1[0][0]
__________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add (TensorFlowOpLa (None, 64, 48, 96) 0 bn2a_branch1[0][0]
bn2a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_2 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add[0][0]
__________________________________________________________________________________________________
res2b_branch2a (Conv2D) (None, 64, 48, 96) 36928 re_lu_2[0][0]
__________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_3 (ReLU) (None, 64, 48, 96) 0 bn2b_branch2a[0][0]
__________________________________________________________________________________________________
res2b_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_3[0][0]
__________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp (None, 64, 48, 96) 0 re_lu_2[0][0]
bn2b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_4 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add_1[0][0]
__________________________________________________________________________________________________
res3a_branch2a (Conv2D) (None, 128, 24, 48) 73856 re_lu_4[0][0]
__________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_5 (ReLU) (None, 128, 24, 48) 0 bn3a_branch2a[0][0]
__________________________________________________________________________________________________
res3a_branch1 (Conv2D) (None, 128, 24, 48) 8320 re_lu_4[0][0]
__________________________________________________________________________________________________
res3a_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_5[0][0]
__________________________________________________________________________________________________
bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48) 512 res3a_branch1[0][0]
__________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_2 (TensorFlowOp (None, 128, 24, 48) 0 bn3a_branch1[0][0]
bn3a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_6 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_2[0][0]
__________________________________________________________________________________________________
res3b_branch2a (Conv2D) (None, 128, 24, 48) 147584 re_lu_6[0][0]
__________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_7 (ReLU) (None, 128, 24, 48) 0 bn3b_branch2a[0][0]
__________________________________________________________________________________________________
res3b_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_7[0][0]
__________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_3 (TensorFlowOp (None, 128, 24, 48) 0 re_lu_6[0][0]
bn3b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_8 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_3[0][0] 2021-02-15 11:09:06,428 [INFO] __main__: Number of images in the training dataset: 351964
2021-02-15 11:09:06,428 [INFO] __main__: Number of images in the validation dataset: 49984
__________________________________________________________________________________________________
res4a_branch2a (Conv2D) (None, 256, 12, 24) 295168 re_lu_8[0][0]
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_9 (ReLU) (None, 256, 12, 24) 0 bn4a_branch2a[0][0]
__________________________________________________________________________________________________
res4a_branch1 (Conv2D) (None, 256, 12, 24) 33024 re_lu_8[0][0]
__________________________________________________________________________________________________
res4a_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_9[0][0]
__________________________________________________________________________________________________
bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24) 1024 res4a_branch1[0][0]
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_4 (TensorFlowOp (None, 256, 12, 24) 0 bn4a_branch1[0][0]
bn4a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_10 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_4[0][0]
__________________________________________________________________________________________________
res4b_branch2a (Conv2D) (None, 256, 12, 24) 590080 re_lu_10[0][0]
__________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_11 (ReLU) (None, 256, 12, 24) 0 bn4b_branch2a[0][0]
__________________________________________________________________________________________________
res4b_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_11[0][0]
__________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_5 (TensorFlowOp (None, 256, 12, 24) 0 re_lu_10[0][0]
bn4b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_12 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_5[0][0]
__________________________________________________________________________________________________
res5a_branch2a (Conv2D) (None, 300, 12, 24) 691500 re_lu_12[0][0]
__________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_13 (ReLU) (None, 300, 12, 24) 0 bn5a_branch2a[0][0]
__________________________________________________________________________________________________
res5a_branch1 (Conv2D) (None, 300, 12, 24) 77100 re_lu_12[0][0]
__________________________________________________________________________________________________
res5a_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_13[0][0]
__________________________________________________________________________________________________
bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24) 1200 res5a_branch1[0][0]
__________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_6 (TensorFlowOp (None, 300, 12, 24) 0 bn5a_branch1[0][0]
bn5a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_14 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_6[0][0]
__________________________________________________________________________________________________
res5b_branch2a (Conv2D) (None, 300, 12, 24) 810300 re_lu_14[0][0]
__________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_15 (ReLU) (None, 300, 12, 24) 0 bn5b_branch2a[0][0]
__________________________________________________________________________________________________
res5b_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_15[0][0]
__________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_7 (TensorFlowOp (None, 300, 12, 24) 0 re_lu_14[0][0]
bn5b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_16 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_7[0][0]
__________________________________________________________________________________________________
permute_feature (Permute) (None, 24, 12, 300) 0 re_lu_16[0][0]
__________________________________________________________________________________________________
flatten_feature (Reshape) (None, 24, 3600) 0 permute_feature[0][0]
_______________________________________Traceback (most recent call last):
___________________________________________________________
lstm (LSTM) (None, 24, 512) 8423424 flatten_feature[0][0]
__________________________________________________________________________________________________
td_dense (TimeDistributed) (None, 24, 36) 18468 lstm[0][0]
__________________________________________________________________________________________________
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 274, in <module>
softmax (Softmax) (None, 24, 36) 0 td_dense[0][0]
==================================================================================================
Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608
__________________________________________________________________________________________________
Epoch 1/1000
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 270, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 195, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1017, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
run_metadata=self.run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 14 num_classes: 36 labels: 10,25,1,0,10,35,0,5,7,1 labels seen so far: 10,25,1,0,10
[[{{node loss_2/softmax_loss/CTCLoss}}]]
(1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 14 num_classes: 36 labels: 10,25,1,0,10,35,0,5,7,1 labels seen so far: 10,25,1,0,10
[[{{node loss_2/softmax_loss/CTCLoss}}]]
[[loss_2/softmax_loss/CTCLoss/_6743]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
File "/usr/local/bin/lprnet", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
Training spec file
random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 13
arch: "baseline"
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 32
num_epochs: 1000
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-5
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 32
}
augmentation_config {
output_width: 96
output_height: 48
output_channel: 3
keep_original_prob: 0.3
transform_prob: 0.5
rotate_degree: 5
}
dataset_config {
data_sources: {
label_directory_path: "/datasets/lpr_ocr_tlt/train/labels"
image_directory_path: "/datasets/lpr_ocr_tlt/train/images"
}
characters_list_file: "/datasets/lpr_ocr_tlt/characters.txt"
validation_data_sources: {
label_directory_path: "/datasets/lpr_ocr_tlt/val/labels"
image_directory_path: "/datasets/lpr_ocr_tlt/val/images"
}
}
Characters
0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
All the images are resized to below configuration, before training
output_width: 96
output_height: 48
output_channel: 3
- Maxium length of character’s is
12
- Character’s length is variable, somewhere between
9
to12
.
Please point out possible methods to test and debug.
Thanks