LPRNet - Poor Accuracy when training from scratch

leviethung1280 · September 7, 2021, 2:34am

I am working on a License Plate Recognition task. Because my dataset contains more special Latin characters, I’ve trained the LPRNet from scratch by following steps:

Change the character dictionary by adding 3 more characters.
Don’t specify the pre-trained model which is provided by NVIDIA.

Finally, I got a very poor result. It seems to be that the model can not learn properly.

My question is:
What should I do to train the LPRNET from scratch correctly?

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue? (This is for errors. Please share the command line and the detailed log here.)

Here is my command to start the training

!lprnet train --gpus=1 --gpu_index=$GPU_INDEX \
              -e $SPECS_DIR/lpr_spec_ger.txt \
              -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
              -k $KEY

My specification:

random_seed: 42
lpr_config {
  hidden_units: 512
  max_label_length: 8
  arch: "baseline"
  nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}

training_config {
  batch_size_per_gpu: 32
  num_epochs: 300
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-4
    soft_start: 0.001
    annealing: 0.7
  }
  }
  regularizer {
    type: L2
    weight: 5e-4
  }
}
eval_config {
  validation_period_during_training: 5
  batch_size: 1
}

augmentation_config {
    output_width: 96
    output_height: 48
    output_channel: 3
    keep_original_prob: 0.3
    transform_prob: 0.5
    rotate_degree: 0
}


dataset_config {
  data_sources: {
    label_directory_path: "/workspace/tlt-experiments/licenseplate_dataset_lprnet/german_synthesis_data_rectangle/org/1/labels"
    image_directory_path: "/workspace/tlt-experiments/licenseplate_dataset_lprnet/german_synthesis_data_rectangle/org/1/images"
  }
  characters_list_file: "/workspace/tlt-experiments/lprnet/specs/ger_lp_characters.txt"
  validation_data_sources: {
    label_directory_path: "/workspace/tlt-experiments/licenseplate_dataset_lprnet/21_Germany_Walking_Berlin_Munich/labels"
    image_directory_path: "/workspace/tlt-experiments/licenseplate_dataset_lprnet/21_Germany_Walking_Berlin_Munich/images"
  }
}

Morganh · September 7, 2021, 3:28am

Suggest you using the US lpr pre-trained model which is provided by NVIDIA even you are not training US dataset.
How many images did you train?

leviethung1280 · September 7, 2021, 3:42am

I initially used the US LPR pre-trained model as a starting point.
However, I could not start the training due to an error.

The output from US LPR is 36 units which are corresponding to 35 characters of US plates.

My dataset contains 39 characters in total.

There is a difference at the output layer of the LPRNet. It raised an exception immediately.

That’s a reason why I chose training from scratch.

My dataset holds 10k images, I think it’s enough for training from scratch?

Morganh · September 7, 2021, 3:44am

Which version of tlt did you use?
Please run “tlt info --verbose”

leviethung1280 · September 7, 2021, 3:47am

I am sorry that it raised the error

I pretty sure that the Notebook is running on this Docker Image.

docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3

leviethung1280 · September 7, 2021, 3:56am

Here are some outputs when starting the training with US pretrained model

The shape of this layer does not match original model: td_dense
Loading the model as a pruned model.
WARNING:tensorflow:No training configuration found in save file: the model was *not* compiled. Compile it manually.
2021-09-07 03:54:49,955 [WARNING] tensorflow: No training configuration found in save file: the model was *not* compiled. Compile it manually.
Initialize optimizer
Model: "lpnet_baseline_18"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
image_input (InputLayer)        [(None, 3, 48, 96)]  0                                            
__________________________________________________________________________________________________
tf_op_layer_Sum (TensorFlowOpLa (None, 1, 48, 96)    0           image_input[0][0]                
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 48, 96)   640         tf_op_layer_Sum[0][0]            
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 48, 96)   256         conv1[0][0]                      
__________________________________________________________________________________________________
re_lu (ReLU)                    (None, 64, 48, 96)   0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 64, 48, 96)   0           re_lu[0][0]                      
__________________________________________________________________________________________________
res2a_branch2a (Conv2D)         (None, 64, 48, 96)   36928       max_pooling2d[0][0]              
__________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizati (None, 64, 48, 96)   256         res2a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_1 (ReLU)                  (None, 64, 48, 96)   0           bn2a_branch2a[0][0]              
__________________________________________________________________________________________________
res2a_branch1 (Conv2D)          (None, 64, 48, 96)   4160        max_pooling2d[0][0]              
__________________________________________________________________________________________________
res2a_branch2b (Conv2D)         (None, 64, 48, 96)   36928       re_lu_1[0][0]                    
__________________________________________________________________________________________________
bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96)   256         res2a_branch1[0][0]              
__________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizati (None, 64, 48, 96)   256         res2a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add (TensorFlowOpLa (None, 64, 48, 96)   0           bn2a_branch1[0][0]               
                                                                 bn2a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_2 (ReLU)                  (None, 64, 48, 96)   0           tf_op_layer_add[0][0]            
__________________________________________________________________________________________________
res2b_branch2a (Conv2D)         (None, 64, 48, 96)   36928       re_lu_2[0][0]                    
__________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizati (None, 64, 48, 96)   256         res2b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_3 (ReLU)                  (None, 64, 48, 96)   0           bn2b_branch2a[0][0]              
__________________________________________________________________________________________________
res2b_branch2b (Conv2D)         (None, 64, 48, 96)   36928       re_lu_3[0][0]                    
__________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizati (None, 64, 48, 96)   256         res2b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp (None, 64, 48, 96)   0           re_lu_2[0][0]                    
                                                                 bn2b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_4 (ReLU)                  (None, 64, 48, 96)   0           tf_op_layer_add_1[0][0]          
__________________________________________________________________________________________________
res3a_branch2a (Conv2D)         (None, 128, 24, 48)  73856       re_lu_4[0][0]                    
__________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizati (None, 128, 24, 48)  512         res3a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_5 (ReLU)                  (None, 128, 24, 48)  0           bn3a_branch2a[0][0]              
__________________________________________________________________________________________________
res3a_branch1 (Conv2D)          (None, 128, 24, 48)  8320        re_lu_4[0][0]                    
__________________________________________________________________________________________________
res3a_branch2b (Conv2D)         (None, 128, 24, 48)  147584      re_lu_5[0][0]                    
__________________________________________________________________________________________________
bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48)  512         res3a_branch1[0][0]              
__________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizati (None, 128, 24, 48)  512         res3a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_2 (TensorFlowOp (None, 128, 24, 48)  0           bn3a_branch1[0][0]               
                                                                 bn3a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_6 (ReLU)                  (None, 128, 24, 48)  0           tf_op_layer_add_2[0][0]          
__________________________________________________________________________________________________
res3b_branch2a (Conv2D)         (None, 128, 24, 48)  147584      re_lu_6[0][0]                    
__________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizati (None, 128, 24, 48)  512         res3b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_7 (ReLU)                  (None, 128, 24, 48)  0           bn3b_branch2a[0][0]              
__________________________________________________________________________________________________
res3b_branch2b (Conv2D)         (None, 128, 24, 48)  147584      re_lu_7[0][0]                    
__________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizati (None, 128, 24, 48)  512         res3b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_3 (TensorFlowOp (None, 128, 24, 48)  0           re_lu_6[0][0]                    
                                                                 bn3b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_8 (ReLU)                  (None, 128, 24, 48)  0           tf_op_layer_add_3[0][0]          
__________________________________________________________________________________________________
res4a_branch2a (Conv2D)         (None, 256, 12, 24)  295168      re_lu_8[0][0]                    
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, 256, 12, 24)  1024        res4a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_9 (ReLU)                  (None, 256, 12, 24)  0           bn4a_branch2a[0][0]              
__________________________________________________________________________________________________
res4a_branch1 (Conv2D)          (None, 256, 12, 24)  33024       re_lu_8[0][0]                    
__________________________________________________________________________________________________
res4a_branch2b (Conv2D)         (None, 256, 12, 24)  590080      re_lu_9[0][0]                    
__________________________________________________________________________________________________
bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24)  1024        res4a_branch1[0][0]              
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, 256, 12, 24)  1024        res4a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_4 (TensorFlowOp (None, 256, 12, 24)  0           bn4a_branch1[0][0]               
                                                                 bn4a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_10 (ReLU)                 (None, 256, 12, 24)  0           tf_op_layer_add_4[0][0]          
__________________________________________________________________________________________________
res4b_branch2a (Conv2D)         (None, 256, 12, 24)  590080      re_lu_10[0][0]                   
__________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizati (None, 256, 12, 24)  1024        res4b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_11 (ReLU)                 (None, 256, 12, 24)  0           bn4b_branch2a[0][0]              
__________________________________________________________________________________________________
res4b_branch2b (Conv2D)         (None, 256, 12, 24)  590080      re_lu_11[0][0]                   
__________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizati (None, 256, 12, 24)  1024        res4b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_5 (TensorFlowOp (None, 256, 12, 24)  0           re_lu_10[0][0]                   
                                                                 bn4b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_12 (ReLU)                 (None, 256, 12, 24)  0           tf_op_layer_add_5[0][0]          
__________________________________________________________________________________________________
res5a_branch2a (Conv2D)         (None, 300, 12, 24)  691500      re_lu_12[0][0]                   
__________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizati (None, 300, 12, 24)  1200        res5a_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_13 (ReLU)                 (None, 300, 12, 24)  0           bn5a_branch2a[0][0]              
__________________________________________________________________________________________________
res5a_branch1 (Conv2D)          (None, 300, 12, 24)  77100       re_lu_12[0][0]                   
__________________________________________________________________________________________________
res5a_branch2b (Conv2D)         (None, 300, 12, 24)  810300      re_lu_13[0][0]                   
__________________________________________________________________________________________________
bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24)  1200        res5a_branch1[0][0]              
__________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizati (None, 300, 12, 24)  1200        res5a_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_6 (TensorFlowOp (None, 300, 12, 24)  0           bn5a_branch1[0][0]               
                                                                 bn5a_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_14 (ReLU)                 (None, 300, 12, 24)  0           tf_op_layer_add_6[0][0]          
__________________________________________________________________________________________________
res5b_branch2a (Conv2D)         (None, 300, 12, 24)  810300      re_lu_14[0][0]                   
__________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizati (None, 300, 12, 24)  1200        res5b_branch2a[0][0]             
__________________________________________________________________________________________________
re_lu_15 (ReLU)                 (None, 300, 12, 24)  0           bn5b_branch2a[0][0]              
__________________________________________________________________________________________________
res5b_branch2b (Conv2D)         (None, 300, 12, 24)  810300      re_lu_15[0][0]                   
__________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizati (None, 300, 12, 24)  1200        res5b_branch2b[0][0]             
__________________________________________________________________________________________________
tf_op_layer_add_7 (TensorFlowOp (None, 300, 12, 24)  0           re_lu_14[0][0]                   
                                                                 bn5b_branch2b[0][0]              
__________________________________________________________________________________________________
re_lu_16 (ReLU)                 (None, 300, 12, 24)  0           tf_op_layer_add_7[0][0]          
__________________________________________________________________________________________________
permute_feature (Permute)       (None, 24, 12, 300)  0           re_lu_16[0][0]                   
__________________________________________________________________________________________________
flatten_feature (Reshape)       (None, 24, 3600)     0           permute_feature[0][0]            
__________________________________________________________________________________________________
lstm (LSTM)                     (None, 24, 512)      8423424     flatten_feature[0][0]            
__________________________________________________________________________________________________
td_dense (TimeDistributed)      (None, 24, 36)       18468       lstm[0][0]                       
__________________________________________________________________________________________________
softmax (Softmax)               (None, 24, 36)       0           td_dense[0][0]                   
==================================================================================================
Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608
__________________________________________________________________________________________________
2021-09-07 03:54:50,347 [INFO] __main__: Number of images in the training dataset:	 10000
2021-09-07 03:54:50,348 [INFO] __main__: Number of images in the validation dataset:	  1514
Epoch 1/300
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 274, in <module>
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 270, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 195, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 265, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 1017, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 36 labels: 25,37,16,1,3,2,9 labels seen so far: 25
	 [[{{node loss_2/softmax_loss/CTCLoss}}]]
	 [[loss_2/softmax_loss/CTCLoss/_6743]]
  (1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 36 labels: 25,37,16,1,3,2,9 labels seen so far: 25
	 [[{{node loss_2/softmax_loss/CTCLoss}}]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/lprnet", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.

Morganh · September 7, 2021, 4:59am

Please update to latest docker.
pip3 uninstall nvidia-tlt
pip3 install nvidia-tao

Or you can directly pull nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.08-py3

leviethung1280 · September 7, 2021, 5:52am

Thank you for your suggestion. I will try it with the second solution.

leviethung1280 · September 7, 2021, 1:26pm

Hi Morganh,

I tried your solution it worked pretty well. No more errors anymore.

Topic		Replies	Views
Error when training LPRNet 2 (characters number < 35) TAO Toolkit	6	1974	September 11, 2021
Lprnet training error (non-null label, index >= num_classes - 1) TAO Toolkit	10	1158	October 12, 2021
Error when training LPRNet TAO Toolkit tensorrt , cuda	10	1361	July 13, 2021
Error training from scratch with character 'O' in LPRNet TAO Toolkit	14	1146	June 25, 2021
Errors encountered when using TAO to train LPRnet TAO Toolkit	19	881	November 17, 2021
Get error when training lprnet with TLT3.0 lancher TAO Toolkit	7	629	October 12, 2021
LPRNet: Invalid loss, terminating training TAO Toolkit	24	2435	January 5, 2022
LPRNet training not converging for 2 line License Plate images TAO Toolkit	7	1021	September 5, 2022
Duplicate chracter in LPR Retraining using TLT-V3 TAO Toolkit	20	1023	October 12, 2021
LPRNet issue while training using custom data TAO Toolkit	3	1073	December 28, 2021

LPRNet - Poor Accuracy when training from scratch

Related topics