Error when training LPRNet 2 (characters number < 35)

Conditions:
I’m creating ALPR system like this: Creating a Real-Time License Plate Detection and Recognition App | NVIDIA Developer Blog
Speaking about ‘License plate recognition’
Launching on PC : tlt lprnet train

Driver Version: 465.19.01
CUDA Version: 11.3
TensorRT: 7.2.3
cudnn: 8.2.0.53
deepstream-app version 5.1.0
DeepStreamSDK 5.1.0

• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
alex@jetson:~$ tlt info --verbose
Configuration of the TLT Instance

dockers:
nvcr.io/nvidia/tlt-streamanalytics:
docker_tag: v3.0-dp-py3
tasks:
1. augment
2. classification
3. detectnet_v2
4. dssd
5. emotionnet
6. faster_rcnn
7. fpenet
8. gazenet
9. gesturenet
10. heartratenet
11. lprnet
12. mask_rcnn
13. retinanet
14. ssd
15. unet
16. yolo_v3
17. yolo_v4
18. tlt-converter
nvcr.io/nvidia/tlt-pytorch:
docker_tag: v3.0-dp-py3
tasks:
1. speech_to_text
2. text_classification
3. question_answering
4. token_classification
5. intent_slot_classification
6. punctuation_and_capitalization
format_version: 1.0
tlt_version: 3.0
published_date: 02/02/2021

• Training spec file(If have, please share here)
tutorial_spec_ru.txt (2.4 KB)

random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 9
arch: “baseline”
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 32
num_epochs: 24
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-5
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 1
}
augmentation_config {
output_width: 96
output_height: 48
output_channel: 3
keep_original_prob: 0.3
transform_prob: 0.5
rotate_degree: 5
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/train/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/train/image”
}
characters_list_file: “/workspace/tlt-experiments/lprnet/specs/ru_lp_characters.txt”
validation_data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/val/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/val/image”
}
}

ru_lp_characters.txt contains:
0
1
2
3
4
5
6
7
8
9
A
B
E
K
M
H
O
P
C
T
Y
X
D

Error
alex@jetson:~$ tlt lprnet train -e /workspace/tlt-experiments/lprnet/tutorial_spec.txt -r /workspace/tlt-experiments/lprnet/ -k nvidia_tlt -m /workspace/tlt-experiments/lprnet/us_lprnet_baseline18_trainable.tlt
2021-07-06 11:11:35,706 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-07-06 08:11:42,448 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-07-06 08:11:42,448 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-07-06 08:11:42,557 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-07-06 08:11:42,558 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-07-06 08:11:44,541 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-07-06 08:11:44,542 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/lprnet/tutorial_spec_ru.txt
2021-07-06 08:11:44,545 [INFO] main: Loading pretrained weights. This may take a while…
The shape of this layer does not match original model: td_dense
Loading the model as a pruned model.
WARNING:tensorflow:No training configuration found in save file: the model was not compiled. Compile it manually.
2021-07-06 08:12:19,613 [WARNING] tensorflow: No training configuration found in save file: the model was not compiled. Compile it manually.
Initialize optimizer
Model: “lpnet_baseline_18”


Layer (type) Output Shape Param # Connected to

image_input (InputLayer) [(None, 3, 48, 96)] 0


tf_op_layer_Sum (TensorFlowOpLa (None, 1, 48, 96) 0 image_input[0][0]


conv1 (Conv2D) (None, 64, 48, 96) 640 tf_op_layer_Sum[0][0]


bn_conv1 (BatchNormalization) (None, 64, 48, 96) 256 conv1[0][0]


re_lu (ReLU) (None, 64, 48, 96) 0 bn_conv1[0][0]


max_pooling2d (MaxPooling2D) (None, 64, 48, 96) 0 re_lu[0][0]


res2a_branch2a (Conv2D) (None, 64, 48, 96) 36928 max_pooling2d[0][0]


bn2a_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2a[0][0]


re_lu_1 (ReLU) (None, 64, 48, 96) 0 bn2a_branch2a[0][0]


res2a_branch1 (Conv2D) (None, 64, 48, 96) 4160 max_pooling2d[0][0]


res2a_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_1[0][0]


bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96) 256 res2a_branch1[0][0]


bn2a_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2b[0][0]


tf_op_layer_add (TensorFlowOpLa (None, 64, 48, 96) 0 bn2a_branch1[0][0]
bn2a_branch2b[0][0]


re_lu_2 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add[0][0]


res2b_branch2a (Conv2D) (None, 64, 48, 96) 36928 re_lu_2[0][0]


bn2b_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2a[0][0]


re_lu_3 (ReLU) (None, 64, 48, 96) 0 bn2b_branch2a[0][0]


res2b_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_3[0][0]


bn2b_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2b[0][0]


tf_op_layer_add_1 (TensorFlowOp (None, 64, 48, 96) 0 re_lu_2[0][0]
bn2b_branch2b[0][0]


re_lu_4 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add_1[0][0]


res3a_branch2a (Conv2D) (None, 128, 24, 48) 73856 re_lu_4[0][0]


bn3a_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2a[0][0]


re_lu_5 (ReLU) (None, 128, 24, 48) 0 bn3a_branch2a[0][0]


res3a_branch1 (Conv2D) (None, 128, 24, 48) 8320 re_lu_4[0][0]


res3a_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_5[0][0]


bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48) 512 res3a_branch1[0][0]


bn3a_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2b[0][0]


tf_op_layer_add_2 (TensorFlowOp (None, 128, 24, 48) 0 bn3a_branch1[0][0]
bn3a_branch2b[0][0]


re_lu_6 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_2[0][0]


res3b_branch2a (Conv2D) (None, 128, 24, 48) 147584 re_lu_6[0][0]


bn3b_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2a[0][0]


re_lu_7 (ReLU) (None, 128, 24, 48) 0 bn3b_branch2a[0][0]


res3b_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_7[0][0]


bn3b_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2b[0][0]


tf_op_layer_add_3 (TensorFlowOp (None, 128, 24, 48) 0 re_lu_6[0][0]
bn3b_branch2b[0][0]


re_lu_8 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_3[0][0]


res4a_branch2a (Conv2D) (None, 256, 12, 24) 295168 re_lu_8[0][0]


bn4a_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2a[0][0]


re_lu_9 (ReLU) (None, 256, 12, 24) 0 bn4a_branch2a[0][0]


res4a_branch1 (Conv2D) (None, 256, 12, 24) 33024 re_lu_8[0][0]


res4a_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_9[0][0]


bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24) 1024 res4a_branch1[0][0]


bn4a_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2b[0][0]


tf_op_layer_add_4 (TensorFlowOp (None, 256, 12, 24) 0 bn4a_branch1[0][0]
bn4a_branch2b[0][0]


re_lu_10 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_4[0][0]


res4b_branch2a (Conv2D) (None, 256, 12, 24) 590080 re_lu_10[0][0]


bn4b_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2a[0][0]


re_lu_11 (ReLU) (None, 256, 12, 24) 0 bn4b_branch2a[0][0]


res4b_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_11[0][0]


bn4b_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2b[0][0]


tf_op_layer_add_5 (TensorFlowOp (None, 256, 12, 24) 0 re_lu_10[0][0]
bn4b_branch2b[0][0]


re_lu_12 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_5[0][0]


res5a_branch2a (Conv2D) (None, 300, 12, 24) 691500 re_lu_12[0][0]


bn5a_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2a[0][0]


re_lu_13 (ReLU) (None, 300, 12, 24) 0 bn5a_branch2a[0][0]


res5a_branch1 (Conv2D) (None, 300, 12, 24) 77100 re_lu_12[0][0]


res5a_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_13[0][0]


bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24) 1200 res5a_branch1[0][0]


bn5a_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2b[0][0]


tf_op_layer_add_6 (TensorFlowOp (None, 300, 12, 24) 0 bn5a_branch1[0][0]
bn5a_branch2b[0][0]


re_lu_14 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_6[0][0]


res5b_branch2a (Conv2D) (None, 300, 12, 24) 810300 re_lu_14[0][0]


bn5b_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2a[0][0]


re_lu_15 (ReLU) (None, 300, 12, 24) 0 bn5b_branch2a[0][0]


res5b_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_15[0][0]


bn5b_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2b[0][0]


tf_op_layer_add_7 (TensorFlowOp (None, 300, 12, 24) 0 re_lu_14[0][0]
bn5b_branch2b[0][0]


re_lu_16 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_7[0][0]


permute_feature (Permute) (None, 24, 12, 300) 0 re_lu_16[0][0]


flatten_feature (Reshape) (None, 24, 3600) 0 permute_feature[0][0]


lstm (LSTM) (None, 24, 512) 8423424 flatten_feature[0][0]


td_dense (TimeDistributed) (None, 24, 36) 18468 lstm[0][0]


softmax (Softmax) (None, 24, 36) 0 td_dense[0][0]

Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608


2021-07-06 08:12:20,258 [INFO] main: Number of images in the training dataset: 44164
2021-07-06 08:12:20,258 [INFO] main: Number of images in the validation dataset: 4954
Epoch 1/24
1/1381 […] - ETA: 4:46:55 - loss: 30.6878WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (1.531048). Check your callbacks.
2021-07-06 08:12:33,920 [WARNING] tensorflow: Method (on_train_batch_end) is slow compared to the batch update (1.531048). Check your callbacks.
1380/1381 [============================>.] - ETA: 0s - loss: 1.51595d5412b7fe57:38:60 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
5d5412b7fe57:38:60 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
5d5412b7fe57:38:60 [0] NCCL INFO NET/IB : No device found.
5d5412b7fe57:38:60 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
5d5412b7fe57:38:60 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
5d5412b7fe57:38:60 [0] NCCL INFO Channel 00/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 01/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 02/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 03/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 04/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 05/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 06/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 07/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 08/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 09/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 10/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 11/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 12/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 13/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 14/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 15/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 16/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 17/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 18/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 19/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 20/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 21/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 22/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 23/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 24/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 25/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 26/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 27/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 28/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 29/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 30/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Channel 31/32 : 0
5d5412b7fe57:38:60 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
5d5412b7fe57:38:60 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
5d5412b7fe57:38:60 [0] NCCL INFO comm 0x7f4642f34040 rank 0 nranks 1 cudaDev 0 busId 1000 - Init COMPLETE

Epoch 00001: saving model to /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-01.tlt
1381/1381 [==============================] - 159s 115ms/step - loss: 1.5151
Epoch 2/24
1380/1381 [============================>.] - ETA: 0s - loss: 0.2815
Epoch 00002: saving model to /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-02.tlt
1381/1381 [==============================] - 132s 95ms/step - loss: 0.2815
Epoch 3/24
1380/1381 [============================>.] - ETA: 0s - loss: 0.2369
Epoch 00003: saving model to /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-03.tlt
1381/1381 [==============================] - 135s 98ms/step - loss: 0.2369
Epoch 4/24
1380/1381 [============================>.] - ETA: 0s - loss: 0.2174
Epoch 00004: saving model to /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-04.tlt
1381/1381 [==============================] - 141s 102ms/step - loss: 0.2175
Epoch 5/24
1380/1381 [============================>.] - ETA: 0s - loss: 0.2088
Epoch 00005: saving model to /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-05.tlt
Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 274, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 270, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 195, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py”, line 727, in fit
use_multiprocessing=use_multiprocessing)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 603, in fit
steps_name=‘steps_per_epoch’)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 332, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/callbacks.py”, line 299, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/callbacks/ac_callback.py”, line 65, in on_epoch_end
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/callbacks/ac_callback.py”, line 42, in _get_accuracy
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/ctc_decoder.py”, line 33, in decode_ctc_conf
IndexError: list index out of range
Traceback (most recent call last):
File “/usr/local/bin/lprnet”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-07-06 11:24:01,934 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

If i add extra letters, till 35, like in us_lp_characters.txt - error disappeared but there are too much extra wrong letters in recognition of number plate

Question: how to tune that

Please update to TLT3.0-py3. You are running with TLT 3.0-dp-py3.
In 3.0-py3, there is not issue.

I cannot bind tlt with new version: i installed it by
docker pull nvcr.io/nvidia/tlt-streamanalytics:v3.0-py3

docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvcr.io/nvidia/tlt-streamanalytics v3.0-py3 344204c03cbe 4 weeks ago 17.2GB
tensorflow/tensorflow latest-gpu-jupyter 346d69d2c7f8 7 weeks ago 5.91GB
tensorflow/tensorflow latest 1d932048a281 7 weeks ago 1.3GB
tensorflow/tensorflow latest-gpu 8b9d78381e5d 7 weeks ago 5.74GB
hello-world latest d1165f221234 4 months ago 13.3kB
nvcr.io/nvidia/tlt-streamanalytics v3.0-dp-py3 a865982b80a3 5 months ago 15.5GB
nvidia/cuda 11.0-base 2ec708416bb8 10 months ago 122MB

i removed all containers and then removed old image itself by cmd
docker rmi nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3

docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
nvcr.io/nvidia/tlt-streamanalytics v3.0-py3 344204c03cbe 4 weeks ago 17.2GB
tensorflow/tensorflow latest-gpu-jupyter 346d69d2c7f8 7 weeks ago 5.91GB
tensorflow/tensorflow latest 1d932048a281 7 weeks ago 1.3GB
tensorflow/tensorflow latest-gpu 8b9d78381e5d 7 weeks ago 5.74GB
hello-world latest d1165f221234 4 months ago 13.3kB
nvidia/cuda 11.0-base 2ec708416bb8 10 months ago 122MB

laucnhed cmd: tlt lprnet train
and it is waiting about 13 min(probably loading old docker image) and continue giving out the same error.

i tried “sudo systemctl restart docker”, tried reboot. - result is the same - it goes for old ‘v3.0-dp-py3’ image

What i missed?

See TLT Quick Start Guide — Transfer Learning Toolkit 3.0 documentation
Please update tlt via below method.

The nvidia-tlt package is hosted in the nvidia-pyindex , which has to be installed as a pre-requisite to install nvidia-tlt .

If you had installed an older version of nvidia-tlt launcher, you may upgrade to the latest version by running the following command.

pip3 install --upgrade nvidia-tlt

Ok, worked, updated nvidia-tlt like you said, then launched again, it abused:
"google.protobuf.text_format.ParseError: 57:5 : Message type “AugmentationConfig” has no field named “transform_prob”. - so i changed ‘tutorial_spec.txt’ file according to new fields - trained ok.

When exporting it also abused:
alex@jetson:~/tlt-experiments/lprnet$ tlt lprnet export -m /workspace/tlt-experiments/lprnet/weights/lprnet_epoch-24.tlt -k nvidia_tlt -e /workspace/tlt-experiments/lprnet/tutorial_spec_ru.txt
2021-07-06 18:39:48,559 [INFO] root: Registry: [‘nvcr.io’]
2021-07-06 18:39:48,602 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2021-07-06 15:39:53,772 [INFO] iva.common.export.keras_exporter: Using input nodes: [‘image_input’]
2021-07-06 15:39:53,772 [INFO] iva.common.export.keras_exporter: Using output nodes: [‘tf_op_layer_ArgMax’, ‘tf_op_layer_Max’]
2021-07-06 15:39:53,772 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tlt-experiments/lprnet/tutorial_spec_ru.txt
The ONNX operator number change on the optimization: 132 → 61
2021-07-06 15:40:03,987 [INFO] keras2onnx: The ONNX operator number change on the optimization: 132 → 61
Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/export.py”, line 215, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/export.py”, line 142, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/export.py”, line 211, in run_export
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py”, line 371, in export
TypeError: set_data_preprocessing_parameters() got an unexpected keyword argument ‘image_mean’

Nevertheless it created file ‘lprnet_epoch-24.etlt’(Morganh, can you comment this), then i run ‘evaluate’ cmd - ok, then copied to Jetson, converted, and run deepstream-lpr-app - recognizes much better.

Question on the way: is there any difference if to train .png format images instead of .jpg now?

Thank you, Morganh for fast and useful replies.

This is a known regression issue in 3.0-py3. You can ignore it. Because the etlt file is generated successfully. Anyway, we will fix this issue in next release.

LPRnet should support jpg or png files.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.