I am trying to perform training from the google colab example below.
The only difference is that I am editing for benchmarks/endtoend/br but the training does not proceed.
download_and_prepare_data.sh
#!/bin/bash
# usage:
# bash download_and_prepare_data.sh /data-dir/openalpr_benchmark_dataset
set -e
set -x
if [ -z "$1" ]; then
echo "usage download_and_preprocess_data.sh [data dir]"
exit
fi
CURRENT_DIR=$(pwd)
echo "Cloning OpenALPR benchmark directory"
if [ ! -e benchmarks ]; then
git clone https://github.com/openalpr/benchmarks benchmarks
fi
# Create the output directories.
OUTPUT_DIR="${1%/}"
mkdir -p "${OUTPUT_DIR}"
# Run our conversion
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
echo "Preprocessing OpenALPR benchmarks data for US"
python3 $SCRIPT_DIR/preprocess_openalpr_benchmark.py \
--input_dir=$CURRENT_DIR/benchmarks/endtoend/br/ \
--output_dir=$OUTPUT_DIR \
log
For multi-GPU, change --gpus based on your machine.
Using TensorFlow backend.
2023-01-03 23:11:06.732242: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-01-03 23:11:11.185033: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2023-01-03 23:11:15.479726: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299995000 Hz
2023-01-03 23:11:15.479932: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x239fdc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-01-03 23:11:15.479963: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-01-03 23:11:15.482470: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-01-03 23:11:15.696290: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:15.697403: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x23a0300 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-01-03 23:11:15.697447: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2023-01-03 23:11:15.697707: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:15.698517: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2023-01-03 23:11:15.698583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-01-03 23:11:15.701305: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-01-03 23:11:15.702722: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-01-03 23:11:15.703134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-01-03 23:11:15.706343: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-01-03 23:11:15.707001: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-01-03 23:11:15.707265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-01-03 23:11:15.707411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:15.708356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:15.709201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0
2023-01-03 23:11:15.709269: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-01-03 23:11:16.217943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-01-03 23:11:16.218012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0
2023-01-03 23:11:16.218032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N
2023-01-03 23:11:16.218378: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:16.219442: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1086] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-03 23:11:16.220349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13750 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
INFO: Log file already exists at /content/drive/MyDrive/results/lprnet/experiment_dir_unpruned/status.json
INFO: Merging specification from /content/drive/MyDrive/nvidia-tao/tensorflow/lprnet/specs/tutorial_spec.txt
INFO: Loading pretrained weights. This may take a while...
Layers that load weights from the pretrained model: ['conv1', 'bn_conv1', 'res2a_branch2a', 'bn2a_branch2a', 'res2a_branch1', 'res2a_branch2b', 'bn2a_branch1', 'bn2a_branch2b', 'res2b_branch2a', 'bn2b_branch2a', 'res2b_branch2b', 'bn2b_branch2b', 'res3a_branch2a', 'bn3a_branch2a', 'res3a_branch1', 'res3a_branch2b', 'bn3a_branch1', 'bn3a_branch2b', 'res3b_branch2a', 'bn3b_branch2a', 'res3b_branch2b', 'bn3b_branch2b', 'res4a_branch2a', 'bn4a_branch2a', 'res4a_branch1', 'res4a_branch2b', 'bn4a_branch1', 'bn4a_branch2b', 'res4b_branch2a', 'bn4b_branch2a', 'res4b_branch2b', 'bn4b_branch2b', 'res5a_branch2a', 'bn5a_branch2a', 'res5a_branch1', 'res5a_branch2b', 'bn5a_branch1', 'bn5a_branch2b', 'res5b_branch2a', 'bn5b_branch2a', 'res5b_branch2b', 'bn5b_branch2b', 'lstm', 'td_dense']
Initialize optimizer
Model: "lpnet_baseline_18"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
image_input (InputLayer) [(None, 3, 48, 96)] 0
__________________________________________________________________________________________________
tf_op_layer_Sum (TensorFlowOpLa [(None, 1, 48, 96)] 0 image_input[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 48, 96) 640 tf_op_layer_Sum[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 48, 96) 256 conv1[0][0]
__________________________________________________________________________________________________
re_lu (ReLU) (None, 64, 48, 96) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 48, 96) 0 re_lu[0][0]
__________________________________________________________________________________________________
res2a_branch2a (Conv2D) (None, 64, 48, 96) 36928 max_pooling2d[0][0]
__________________________________________________________________________________________________
bn2a_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_1 (ReLU) (None, 64, 48, 96) 0 bn2a_branch2a[0][0]
__________________________________________________________________________________________________
res2a_branch1 (Conv2D) (None, 64, 48, 96) 4160 max_pooling2d[0][0]
__________________________________________________________________________________________________
res2a_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_1[0][0]
__________________________________________________________________________________________________
bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96) 256 res2a_branch1[0][0]
__________________________________________________________________________________________________
bn2a_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add (TensorFlowOpLa [(None, 64, 48, 96)] 0 bn2a_branch1[0][0]
bn2a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_2 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add[0][0]
__________________________________________________________________________________________________
res2b_branch2a (Conv2D) (None, 64, 48, 96) 36928 re_lu_2[0][0]
__________________________________________________________________________________________________
bn2b_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_3 (ReLU) (None, 64, 48, 96) 0 bn2b_branch2a[0][0]
__________________________________________________________________________________________________
res2b_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_3[0][0]
__________________________________________________________________________________________________
bn2b_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_1 (TensorFlowOp [(None, 64, 48, 96)] 0 re_lu_2[0][0]
bn2b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_4 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add_1[0][0]
__________________________________________________________________________________________________
res3a_branch2a (Conv2D) (None, 128, 24, 48) 73856 re_lu_4[0][0]
__________________________________________________________________________________________________
bn3a_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_5 (ReLU) (None, 128, 24, 48) 0 bn3a_branch2a[0][0]
__________________________________________________________________________________________________
res3a_branch1 (Conv2D) (None, 128, 24, 48) 8320 re_lu_4[0][0]
__________________________________________________________________________________________________
res3a_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_5[0][0]
__________________________________________________________________________________________________
bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48) 512 res3a_branch1[0][0]
__________________________________________________________________________________________________
bn3a_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_2 (TensorFlowOp [(None, 128, 24, 48) 0 bn3a_branch1[0][0]
bn3a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_6 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_2[0][0]
__________________________________________________________________________________________________
res3b_branch2a (Conv2D) (None, 128, 24, 48) 147584 re_lu_6[0][0]
__________________________________________________________________________________________________
bn3b_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_7 (ReLU) (None, 128, 24, 48) 0 bn3b_branch2a[0][0]
__________________________________________________________________________________________________
res3b_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_7[0][0]
__________________________________________________________________________________________________
bn3b_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_3 (TensorFlowOp [(None, 128, 24, 48) 0 re_lu_6[0][0]
bn3b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_8 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_3[0][0]
__________________________________________________________________________________________________
res4a_branch2a (Conv2D) (None, 256, 12, 24) 295168 re_lu_8[0][0]
__________________________________________________________________________________________________
bn4a_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_9 (ReLU) (None, 256, 12, 24) 0 bn4a_branch2a[0][0]
__________________________________________________________________________________________________
res4a_branch1 (Conv2D) (None, 256, 12, 24) 33024 re_lu_8[0][0]
__________________________________________________________________________________________________
res4a_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_9[0][0]
__________________________________________________________________________________________________
bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24) 1024 res4a_branch1[0][0]
__________________________________________________________________________________________________
bn4a_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_4 (TensorFlowOp [(None, 256, 12, 24) 0 bn4a_branch1[0][0]
bn4a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_10 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_4[0][0]
__________________________________________________________________________________________________
res4b_branch2a (Conv2D) (None, 256, 12, 24) 590080 re_lu_10[0][0]
__________________________________________________________________________________________________
bn4b_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_11 (ReLU) (None, 256, 12, 24) 0 bn4b_branch2a[0][0]
__________________________________________________________________________________________________
res4b_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_11[0][0]
__________________________________________________________________________________________________
bn4b_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_5 (TensorFlowOp [(None, 256, 12, 24) 0 re_lu_10[0][0]
bn4b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_12 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_5[0][0]
__________________________________________________________________________________________________
res5a_branch2a (Conv2D) (None, 300, 12, 24) 691500 re_lu_12[0][0]
__________________________________________________________________________________________________
bn5a_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_13 (ReLU) (None, 300, 12, 24) 0 bn5a_branch2a[0][0]
__________________________________________________________________________________________________
res5a_branch1 (Conv2D) (None, 300, 12, 24) 77100 re_lu_12[0][0]
__________________________________________________________________________________________________
res5a_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_13[0][0]
__________________________________________________________________________________________________
bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24) 1200 res5a_branch1[0][0]
__________________________________________________________________________________________________
bn5a_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_6 (TensorFlowOp [(None, 300, 12, 24) 0 bn5a_branch1[0][0]
bn5a_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_14 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_6[0][0]
__________________________________________________________________________________________________
res5b_branch2a (Conv2D) (None, 300, 12, 24) 810300 re_lu_14[0][0]
__________________________________________________________________________________________________
bn5b_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2a[0][0]
__________________________________________________________________________________________________
re_lu_15 (ReLU) (None, 300, 12, 24) 0 bn5b_branch2a[0][0]
__________________________________________________________________________________________________
res5b_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_15[0][0]
__________________________________________________________________________________________________
bn5b_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2b[0][0]
__________________________________________________________________________________________________
tf_op_layer_add_7 (TensorFlowOp [(None, 300, 12, 24) 0 re_lu_14[0][0]
bn5b_branch2b[0][0]
__________________________________________________________________________________________________
re_lu_16 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_7[0][0]
__________________________________________________________________________________________________
permute_feature (Permute) (None, 24, 12, 300) 0 re_lu_16[0][0]
__________________________________________________________________________________________________
flatten_feature (Reshape) (None, 24, 3600) 0 permute_feature[0][0]
__________________________________________________________________________________________________
lstm (LSTM) (None, 24, 512) 8423424 flatten_feature[0][0]
__________________________________________________________________________________________________
td_dense (TimeDistributed) (None, 24, 36) 18468 lstm[0][0]
__________________________________________________________________________________________________
softmax (Softmax) (None, 24, 36) 0 td_dense[0][0]
==================================================================================================
Total params: 14,432,480
Trainable params: 14,424,872
Non-trainable params: 7,608
__________________________________________________________________________________________________
INFO: Number of images in the training dataset: 56
INFO: Number of images in the validation dataset: 55
INFO: Log file already exists at /content/drive/MyDrive/results/lprnet/experiment_dir_unpruned/status.json
INFO: Starting Training Loop.
Epoch 1/24
INFO: 'O'
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/lprnet/scripts/train.py>", line 3, in <module>
File "<frozen iva.lprnet.scripts.train>", line 348, in <module>
File "<frozen iva.lprnet.scripts.train>", line 344, in main
File "<frozen iva.lprnet.scripts.train>", line 331, in main
File "<frozen iva.lprnet.scripts.train>", line 250, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
steps_name='steps_per_epoch')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration
batch_data = _get_next_batch(generator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch
generator_output = next(generator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 789, in get
six.reraise(*sys.exc_info())
File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 783, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index
return _SHARED_SEQUENCES[uid][i]
File "<frozen iva.lprnet.dataloader.data_sequence>", line 130, in __getitem__
File "<frozen iva.lprnet.dataloader.data_sequence>", line 130, in <listcomp>
KeyError: 'O'
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
Any suggestions