LPRNet raise ValueError("index can't contain negative values")

rishika.v · September 8, 2021, 7:33am

Hi, I am running a LPRNET training for custom dataset, and I am getting this issue it would be great if someone could help.

24/1139 […] - ETA: 8:12 - loss: 10.6416Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 277, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 273, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 198, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py”, line 727, in fit
use_multiprocessing=use_multiprocessing)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 603, in fit
steps_name=‘steps_per_epoch’)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 221, in model_iteration
batch_data = _get_next_batch(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 363, in _get_next_batch
generator_output = next(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 789, in get
six.reraise(*sys.exc_info())
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 783, in get
inputs = self.queue.get(block=True).get()
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 644, in get
raise self._value
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 571, in get_index
return _SHARED_SEQUENCES[uid][i]
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py”, line 121, in getitem
File “<array_function internals>”, line 6, in pad
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 793, in pad
pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 560, in _as_pairs
raise ValueError(“index can’t contain negative values”)
ValueError: index can’t contain negative values

Morganh · September 8, 2021, 9:25am

Can you share training spec and character txt file?

rishika.v · September 8, 2021, 9:33am

Character text file

0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G
H
I
J
K
L
M
N
P
Q
R
S
T
U
V
W
X
Y
Z

rishika.v · September 8, 2021, 9:34am

Here is my training spec file.

random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 10
arch: “baseline”
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 16
num_epochs: 200
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 1
}
augmentation_config {
output_width: 96
output_height: 48
output_channel: 3
max_rotate_degree: 5
rotate_prob: 0.5
gaussian_kernel_size: 5
gaussian_kernel_size: 7
gaussian_kernel_size: 15
blur_prob: 0.5
reverse_color_prob: 0.5
keep_original_prob: 0.3
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/train/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/train/image”
}
characters_list_file: “/workspace/tlt-experiments/lprnet/specs/us_lp_characters.txt”
validation_data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/val/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/val/image”
}
}

Morganh · September 8, 2021, 10:28am

Can you follow LPRnet jupyter notebook and retry?

Morganh · September 8, 2021, 10:32am

Since you are training your custom images, please decrease your images to check if there are issues in some images?

rishika.v · September 8, 2021, 10:43am

I am training using the jupyter notebook.
with respect to the images, I tried to take batch of images and sort the ones without any issue, and then when I combined the entire data set, I got this error again.

Is it something to do with the batch_size, or number_of_steps_per_epoch or workers or something?

rishika.v · September 8, 2021, 12:46pm

Hi @Morganh I could really use your help now.

Morganh · September 8, 2021, 1:30pm

I am afraid there is some issue in one image or several images.
You can try with below way:

Try 100 images, if there is no issues,
Try 500 images, if there is no issues,
Try 1000 images. if there is issue in training,

Then there is some issue between 500th image and 1000th image.

rishika.v · September 9, 2021, 8:48am

Hi @Morganh,
This is the first step of debug that I did.
When I clubbed the no-issue data set, it ran back to the same issue.
It would be great if you could let me know what parameter is causing this issue?
It could be something to do with batch size, or steps_per_epoch, or workers, or gpu, or I dont know what else…

rishika.v · September 9, 2021, 8:53am

I wanted to understand how should my “LABEL” file be, I mean what should be inside the ‘.txt’ file.
RJ*******_lp.jpg is how the image is labelled, and the “.txt” file hold “RJ*******”…
I hope there is no problem with this format correct?

Morganh · September 9, 2021, 9:19am

Can you share one image and its label?

Morganh · September 9, 2021, 9:21am

More, did you ever train with openALPR dataset which is mentioned in the notebook? If yes, is it successful?

rishika.v · September 9, 2021, 9:26am

Hi @Morganh,
I have trained using the OPENALPR data set before to start of with, and it was successful
I will attach the image along with its label.

rishika.v · September 9, 2021, 9:47am

GJ07AR1769_lp.txt (10 Bytes)

I hope this helps

Morganh · September 9, 2021, 9:53am

Do all the labels have maximum length of 10?

More, how many images in your training dataset?

rishika.v · September 9, 2021, 10:13am

Should all the labels have length 10, it could be less as well right?

36437 is the data set size but i dont think It has anything to do with the data set size as I ran it on 51 images to test if there was any issue with the data set.
It worked completely fine!

Morganh · September 9, 2021, 10:24am

So, you ran it on 51 images well. But failed in training 36437 images?

rishika.v · September 9, 2021, 10:27am

Yes that is correct.
And besides that I just tested with a image which had label length 9, keeping the max_label_length=10, I did not get any error, It worked fine!
The issue is somewhere else.

Morganh · September 9, 2021, 10:27am

Can you check if some images or labels have length of more than 10?

Topic		Replies	Views
Get error when training lprnet with TLT3.0 lancher TAO Toolkit	7	540	October 12, 2021
Lprnet training error (non-null label, index >= num_classes - 1) TAO Toolkit	10	1004	October 12, 2021
LPRNet: Invalid loss, terminating training TAO Toolkit	24	2162	January 5, 2022
LPRNet issue while training using custom data TAO Toolkit	3	994	December 28, 2021
Error when training LPRNet TAO Toolkit tensorrt , cuda	10	1251	July 13, 2021
LPRNet - Poor Accuracy when training from scratch TAO Toolkit	9	924	October 12, 2021
Error when training LPRNet DeepStream SDK	3	856	October 12, 2021
Train_ssd.py indices error Jetson Nano jetson-inference	12	1720	December 15, 2021
Errors encountered when using TAO to train LPRnet TAO Toolkit	19	698	November 17, 2021
Successful training with "train_ssd.py" using small custom data set, but error on full data set Jetson Nano ai-training	6	1810	October 18, 2021

LPRNet raise ValueError("index can't contain negative values")

Related topics