LPRNet raise ValueError("index can't contain negative values")

Hi, I am running a LPRNET training for custom dataset, and I am getting this issue it would be great if someone could help.

24/1139 […] - ETA: 8:12 - loss: 10.6416Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 277, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 273, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 198, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py”, line 727, in fit
use_multiprocessing=use_multiprocessing)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 603, in fit
steps_name=‘steps_per_epoch’)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 221, in model_iteration
batch_data = _get_next_batch(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 363, in _get_next_batch
generator_output = next(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 789, in get
six.reraise(*sys.exc_info())
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 783, in get
inputs = self.queue.get(block=True).get()
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 644, in get
raise self._value
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 571, in get_index
return _SHARED_SEQUENCES[uid][i]
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py”, line 121, in getitem
File “<array_function internals>”, line 6, in pad
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 793, in pad
pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 560, in _as_pairs
raise ValueError(“index can’t contain negative values”)
ValueError: index can’t contain negative values

Can you share training spec and character txt file?

Character text file

0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
G
H
I
J
K
L
M
N
P
Q
R
S
T
U
V
W
X
Y
Z

Here is my training spec file.

random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 10
arch: “baseline”
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 16
num_epochs: 200
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 1
}
augmentation_config {
output_width: 96
output_height: 48
output_channel: 3
max_rotate_degree: 5
rotate_prob: 0.5
gaussian_kernel_size: 5
gaussian_kernel_size: 7
gaussian_kernel_size: 15
blur_prob: 0.5
reverse_color_prob: 0.5
keep_original_prob: 0.3
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/train/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/train/image”
}
characters_list_file: “/workspace/tlt-experiments/lprnet/specs/us_lp_characters.txt”
validation_data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/openalpr/val/label”
image_directory_path: “/workspace/tlt-experiments/data/openalpr/val/image”
}
}

Can you follow LPRnet jupyter notebook and retry?

Since you are training your custom images, please decrease your images to check if there are issues in some images?

I am training using the jupyter notebook.
with respect to the images, I tried to take batch of images and sort the ones without any issue, and then when I combined the entire data set, I got this error again.

Is it something to do with the batch_size, or number_of_steps_per_epoch or workers or something?

Hi @Morganh I could really use your help now.

I am afraid there is some issue in one image or several images.
You can try with below way:

  1. Try 100 images, if there is no issues,
  2. Try 500 images, if there is no issues,
  3. Try 1000 images. if there is issue in training,

Then there is some issue between 500th image and 1000th image.

Hi @Morganh,
This is the first step of debug that I did.
When I clubbed the no-issue data set, it ran back to the same issue.
It would be great if you could let me know what parameter is causing this issue?
It could be something to do with batch size, or steps_per_epoch, or workers, or gpu, or I dont know what else…

I wanted to understand how should my “LABEL” file be, I mean what should be inside the ‘.txt’ file.
RJ*******_lp.jpg is how the image is labelled, and the “.txt” file hold “RJ*******”…
I hope there is no problem with this format correct?

Can you share one image and its label?

More, did you ever train with openALPR dataset which is mentioned in the notebook? If yes, is it successful?

Hi @Morganh,
I have trained using the OPENALPR data set before to start of with, and it was successful
I will attach the image along with its label.

GJ07AR1769_lp
GJ07AR1769_lp.txt (10 Bytes)

I hope this helps

Do all the labels have maximum length of 10?

More, how many images in your training dataset?

Should all the labels have length 10, it could be less as well right?

36437 is the data set size but i dont think It has anything to do with the data set size as I ran it on 51 images to test if there was any issue with the data set.
It worked completely fine!

So, you ran it on 51 images well. But failed in training 36437 images?

Yes that is correct.
And besides that I just tested with a image which had label length 9, keeping the max_label_length=10, I did not get any error, It worked fine!
The issue is somewhere else.

Can you check if some images or labels have length of more than 10?