Hi, I am running a LPRNET training for custom dataset, and I am getting this issue it would be great if someone could help.
24/1139 […] - ETA: 8:12 - loss: 10.6416Traceback (most recent call last):
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 277, in
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 273, in main
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py”, line 198, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py”, line 727, in fit
use_multiprocessing=use_multiprocessing)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 603, in fit
steps_name=‘steps_per_epoch’)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 221, in model_iteration
batch_data = _get_next_batch(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py”, line 363, in _get_next_batch
generator_output = next(generator)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 789, in get
six.reraise(*sys.exc_info())
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 783, in get
inputs = self.queue.get(block=True).get()
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 644, in get
raise self._value
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py”, line 571, in get_index
return _SHARED_SEQUENCES[uid][i]
File “/opt/tlt/.cache/dazel/_dazel_tlt/2b81a5aac84a1d3b7a324f2a7a6f400b/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py”, line 121, in getitem
File “<array_function internals>”, line 6, in pad
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 793, in pad
pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py”, line 560, in _as_pairs
raise ValueError(“index can’t contain negative values”)
ValueError: index can’t contain negative values
I am training using the jupyter notebook.
with respect to the images, I tried to take batch of images and sort the ones without any issue, and then when I combined the entire data set, I got this error again.
Is it something to do with the batch_size, or number_of_steps_per_epoch or workers or something?
Hi @Morganh,
This is the first step of debug that I did.
When I clubbed the no-issue data set, it ran back to the same issue.
It would be great if you could let me know what parameter is causing this issue?
It could be something to do with batch size, or steps_per_epoch, or workers, or gpu, or I dont know what else…
I wanted to understand how should my “LABEL” file be, I mean what should be inside the ‘.txt’ file.
RJ*******_lp.jpg is how the image is labelled, and the “.txt” file hold “RJ*******”…
I hope there is no problem with this format correct?
Should all the labels have length 10, it could be less as well right?
36437 is the data set size but i dont think It has anything to do with the data set size as I ran it on 51 images to test if there was any issue with the data set.
It worked completely fine!
Yes that is correct.
And besides that I just tested with a image which had label length 9, keeping the max_label_length=10, I did not get any error, It worked fine!
The issue is somewhere else.