Lprnet training error (non-null label, index >= num_classes - 1)

Hi,

I am testing lprnet newly added in tlt-v3.0
After following the data preparation steps, I started the training but it is failing.

Error

Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-02-15 11:08:06,127 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-02-15 11:08:06,127 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-02-15 11:08:06,297 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-02-15 11:08:06,298 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-02-15 11:08:06,821 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-02-15 11:08:06,822 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from specs/tutorial_spec.txt
2021-02-15 11:08:06,826 [INFO] __main__: Loading pretrained weights. This may take a while...
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2021-02-15 11:09:01,237 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
The shape of this layer does not match original model: td_dense
Loading the model as a pruned model.
Initialize optimizer
Model: "lpnet_baseline_18"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
image_input (InputLayer)        [(None, 3, 48, 96)]  0                                            
__________________________________________________________________________________________________
tf_op_layer_Sum (TensorFlowOpLa (None, 1, 48, 96)    0           image_input[0][0]                
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 48, 96)   640         tf_op_layer_Sum[0][0]            
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 48, 96)   256         conv1[0][0]                      
__________________________________________________________________________________________________
re_lu (ReLU)                    (None, 64, 48, 96)   0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 64, 48, 96)   0           re_lu[0][0]                      
__________________________________________________________________________________________________
res2a_branch2a (Conv2D)         (None, 64, 48, 96)   36928       max_pooling2d[0][0]              
__________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizati (None, 64, 48, 96)   256         res2a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_1 (ReLU)                  (None, 64, 48, 96)   0           bn2a_branch2a[0][0]              
__________________________________________________________________________________________________
res2a_branch1 (Conv2D)          (None, 64, 48, 96)   4160        max_pooling2d[0][0]              
__________________________________________________________________________________________________
res2a_branch2b (Conv2D)         (None, 64, 48, 96)   36928       re_lu_1[0][0]                    
__________________________________________________________________________________________________
bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96)   256         res2a_branch1[0][0]              
__________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizati (None, 64, 48, 96)   256         res2a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add (TensorFlowOpLa (None, 64, 48, 96)   0           bn2a_branch1[0][0]               
                                                                 bn2a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_2 (ReLU)                  (None, 64, 48, 96)   0           tf_op_layer_add[0][0]            
__________________________________________________________________________________________________
res2b_branch2a (Conv2D)         (None, 64, 48, 96)   36928       re_lu_2[0][0]                    
__________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizati (None, 64, 48, 96)   256         res2b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_3 (ReLU)                  (None, 64, 48, 96)   0           bn2b_branch2a[0][0]              
__________________________________________________________________________________________________
res2b_branch2b (Conv2D)         (None, 64, 48, 96)   36928       re_lu_3[0][0]                    
__________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizati (None, 64, 48, 96)   256         res2b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp (None, 64, 48, 96)   0           re_lu_2[0][0]                    
                                                                 bn2b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_4 (ReLU)                  (None, 64, 48, 96)   0           tf_op_layer_add_1[0][0]          
__________________________________________________________________________________________________
res3a_branch2a (Conv2D)         (None, 128, 24, 48)  73856       re_lu_4[0][0]                    
__________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizati (None, 128, 24, 48)  512         res3a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_5 (ReLU)                  (None, 128, 24, 48)  0           bn3a_branch2a[0][0]              
__________________________________________________________________________________________________
res3a_branch1 (Conv2D)          (None, 128, 24, 48)  8320        re_lu_4[0][0]                    
__________________________________________________________________________________________________
res3a_branch2b (Conv2D)         (None, 128, 24, 48)  147584      re_lu_5[0][0]                    
__________________________________________________________________________________________________
bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48)  512         res3a_branch1[0][0]              
__________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizati (None, 128, 24, 48)  512         res3a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_2 (TensorFlowOp (None, 128, 24, 48)  0           bn3a_branch1[0][0]               
                                                                 bn3a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_6 (ReLU)                  (None, 128, 24, 48)  0           tf_op_layer_add_2[0][0]          
__________________________________________________________________________________________________
res3b_branch2a (Conv2D)         (None, 128, 24, 48)  147584      re_lu_6[0][0]                    
__________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizati (None, 128, 24, 48)  512         res3b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_7 (ReLU)                  (None, 128, 24, 48)  0           bn3b_branch2a[0][0]              
__________________________________________________________________________________________________
res3b_branch2b (Conv2D)         (None, 128, 24, 48)  147584      re_lu_7[0][0]                    
__________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizati (None, 128, 24, 48)  512         res3b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_3 (TensorFlowOp (None, 128, 24, 48)  0           re_lu_6[0][0]                    
                                                                 bn3b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_8 (ReLU)                  (None, 128, 24, 48)  0           tf_op_layer_add_3[0][0]          2021-02-15 11:09:06,428 [INFO] __main__: Number of images in the training dataset:	351964
2021-02-15 11:09:06,428 [INFO] __main__: Number of images in the validation dataset:	 49984

__________________________________________________________________________________________________
res4a_branch2a (Conv2D)         (None, 256, 12, 24)  295168      re_lu_8[0][0]                    
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, 256, 12, 24)  1024        res4a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_9 (ReLU)                  (None, 256, 12, 24)  0           bn4a_branch2a[0][0]              
__________________________________________________________________________________________________
res4a_branch1 (Conv2D)          (None, 256, 12, 24)  33024       re_lu_8[0][0]                    
__________________________________________________________________________________________________
res4a_branch2b (Conv2D)         (None, 256, 12, 24)  590080      re_lu_9[0][0]                    
__________________________________________________________________________________________________
bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24)  1024        res4a_branch1[0][0]              
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, 256, 12, 24)  1024        res4a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_4 (TensorFlowOp (None, 256, 12, 24)  0           bn4a_branch1[0][0]               
                                                                 bn4a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_10 (ReLU)                 (None, 256, 12, 24)  0           tf_op_layer_add_4[0][0]          
__________________________________________________________________________________________________
res4b_branch2a (Conv2D)         (None, 256, 12, 24)  590080      re_lu_10[0][0]                   
__________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizati (None, 256, 12, 24)  1024        res4b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_11 (ReLU)                 (None, 256, 12, 24)  0           bn4b_branch2a[0][0]              
__________________________________________________________________________________________________
res4b_branch2b (Conv2D)         (None, 256, 12, 24)  590080      re_lu_11[0][0]                   
__________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizati (None, 256, 12, 24)  1024        res4b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_5 (TensorFlowOp (None, 256, 12, 24)  0           re_lu_10[0][0]                   
                                                                 bn4b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_12 (ReLU)                 (None, 256, 12, 24)  0           tf_op_layer_add_5[0][0]          
__________________________________________________________________________________________________
res5a_branch2a (Conv2D)         (None, 300, 12, 24)  691500      re_lu_12[0][0]                   
__________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizati (None, 300, 12, 24)  1200        res5a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_13 (ReLU)                 (None, 300, 12, 24)  0           bn5a_branch2a[0][0]              
__________________________________________________________________________________________________
res5a_branch1 (Conv2D)          (None, 300, 12, 24)  77100       re_lu_12[0][0]                   
__________________________________________________________________________________________________
res5a_branch2b (Conv2D)         (None, 300, 12, 24)  810300      re_lu_13[0][0]                   
__________________________________________________________________________________________________
bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24)  1200        res5a_branch1[0][0]              
__________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizati (None, 300, 12, 24)  1200        res5a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_6 (TensorFlowOp (None, 300, 12, 24)  0           bn5a_branch1[0][0]               
                                                                 bn5a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_14 (ReLU)                 (None, 300, 12, 24)  0           tf_op_layer_add_6[0][0]          
__________________________________________________________________________________________________
res5b_branch2a (Conv2D)         (None, 300, 12, 24)  810300      re_lu_14[0][0]                   
__________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizati (None, 300, 12, 24)  1200        res5b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_15 (ReLU)                 (None, 300, 12, 24)  0           bn5b_branch2a[0][0]              
__________________________________________________________________________________________________
res5b_branch2b (Conv2D)         (None, 300, 12, 24)  810300      re_lu_15[0][0]                   
__________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizati (None, 300, 12, 24)  1200        res5b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_7 (TensorFlowOp (None, 300, 12, 24)  0           re_lu_14[0][0]                   
                                                                 bn5b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_16 (ReLU)                 (None, 300, 12, 24)  0           tf_op_layer_add_7[0][0]          
__________________________________________________________________________________________________
permute_feature (Permute)       (None, 24, 12, 300)  0           re_lu_16[0][0]                   
__________________________________________________________________________________________________
flatten_feature (Reshape)       (None, 24, 3600)     0           permute_feature[0][0]            
_______________________________________Traceback (most recent call last):
___________________________________________________________
lstm (LSTM)                     (None, 24, 512)      8423424     flatten_feature[0][0]            
__________________________________________________________________________________________________
td_dense (TimeDistributed)      (None, 24, 36)       18468       lstm[0][0]                       
__________________________________________________________________________________________________
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 274, in <module>
softmax (Softmax)               (None, 24, 36)       0           td_dense[0][0]                   
==================================================================================================
Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608
__________________________________________________________________________________________________
Epoch 1/1000
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 270, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 195, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1017, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 14 num_classes: 36 labels: 10,25,1,0,10,35,0,5,7,1 labels seen so far: 10,25,1,0,10
	 [[{{node loss_2/softmax_loss/CTCLoss}}]]
  (1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 14 num_classes: 36 labels: 10,25,1,0,10,35,0,5,7,1 labels seen so far: 10,25,1,0,10
	 [[{{node loss_2/softmax_loss/CTCLoss}}]]
	 [[loss_2/softmax_loss/CTCLoss/_6743]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/lprnet", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.

Training spec file

random_seed: 42
lpr_config {
  hidden_units: 512
  max_label_length: 13
  arch: "baseline"
  nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}

training_config {
  batch_size_per_gpu: 32
  num_epochs: 1000
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-5
    soft_start: 0.001
    annealing: 0.5
  }
  }
  regularizer {
    type: L2
    weight: 5e-4
  }
}

eval_config {
  validation_period_during_training: 5
  batch_size: 32
}

augmentation_config {
    output_width: 96
    output_height: 48
    output_channel: 3
    keep_original_prob: 0.3
    transform_prob: 0.5
    rotate_degree: 5
}

dataset_config {
  data_sources: {
    label_directory_path: "/datasets/lpr_ocr_tlt/train/labels"
    image_directory_path: "/datasets/lpr_ocr_tlt/train/images"
  }
  characters_list_file: "/datasets/lpr_ocr_tlt/characters.txt"
  validation_data_sources: {
    label_directory_path: "/datasets/lpr_ocr_tlt/val/labels"
    image_directory_path: "/datasets/lpr_ocr_tlt/val/images"
  }
}

Characters

0
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z

All the images are resized to below configuration, before training

output_width: 96
output_height: 48
output_channel: 3
  1. Maxium length of character’s is 12
  2. Character’s length is variable, somewhere between 9 to 12.

Please point out possible methods to test and debug.
Thanks

Can you try max_label_length: 12 ?

Yes, I have tried 12, 13, 15, 20, 25 same error occurs.

Did you run the notebook successfully?

No, I have followed the sample notebook available in /workspace directory to reproduce the results with our custom dataset. But I haven’t run the sample notebook as it is.

To narrow down, you can try the pubic dataset mentioned in the notebook.

Sure, I will try to run the sample notebook.

Hi @NitinRai
For your case, please

  • check whether your LPs has character of ‘o’
  • if the ‘o’ is needed, please train without pretrained model.