For multi-GPU, change --gpus based on your machine. /home/adminp/.local/lib/python3.6/site-packages/tlt/__init__.py:20: DeprecationWarning: The `nvidia-tlt` package will be deprecated soon. Going forward please migrate to using the `nvidia-tao` package. warnings.warn(message, DeprecationWarning) ~/.tao_mounts.json wasn't found. Falling back to obtain mount points and docker configs from ~/.tlt_mounts.json. Please note that this will be deprecated going forward. 2022-01-25 10:02:41,522 [INFO] root: Registry: ['nvcr.io'] 2022-01-25 10:02:41,566 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 2022-01-25 10:02:41,592 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the "user":"UID:GID" in the DockerOptions portion of the "/home/adminp/.tlt_mounts.json" file. You can obtain your users UID and GID by using the "id -u" and "id -g" commands on the terminal. Using TensorFlow backend. WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. Using TensorFlow backend. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. 2022-01-25 04:32:46,163 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:57: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. 2022-01-25 04:32:46,163 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead. 2022-01-25 04:32:46,412 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:61: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead. 2022-01-25 04:32:46,412 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/tlt-experiments/lprnet/specs/tutorial_spec.txt Initialize optimizer Model: "lpnet_baseline_18" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== image_input (InputLayer) [(None, 3, 48, 96)] 0 __________________________________________________________________________________________________ tf_op_layer_Sum (TensorFlowOpLa [(None, 1, 48, 96)] 0 image_input[0][0] __________________________________________________________________________________________________ conv1 (Conv2D) (None, 64, 48, 96) 640 tf_op_layer_Sum[0][0] __________________________________________________________________________________________________ bn_conv1 (BatchNormalization) (None, 64, 48, 96) 256 conv1[0][0] __________________________________________________________________________________________________ re_lu (ReLU) (None, 64, 48, 96) 0 bn_conv1[0][0] __________________________________________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 64, 48, 96) 0 re_lu[0][0] __________________________________________________________________________________________________ res2a_branch2a (Conv2D) (None, 64, 48, 96) 36928 max_pooling2d[0][0] __________________________________________________________________________________________________ bn2a_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2a[0][0] __________________________________________________________________________________________________ re_lu_1 (ReLU) (None, 64, 48, 96) 0 bn2a_branch2a[0][0] __________________________________________________________________________________________________ res2a_branch1 (Conv2D) (None, 64, 48, 96) 4160 max_pooling2d[0][0] __________________________________________________________________________________________________ res2a_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_1[0][0] __________________________________________________________________________________________________ bn2a_branch1 (BatchNormalizatio (None, 64, 48, 96) 256 res2a_branch1[0][0] __________________________________________________________________________________________________ bn2a_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2a_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add (TensorFlowOpLa [(None, 64, 48, 96)] 0 bn2a_branch1[0][0] bn2a_branch2b[0][0] __________________________________________________________________________________________________ re_lu_2 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add[0][0] __________________________________________________________________________________________________ res2b_branch2a (Conv2D) (None, 64, 48, 96) 36928 re_lu_2[0][0] __________________________________________________________________________________________________ bn2b_branch2a (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2a[0][0] __________________________________________________________________________________________________ re_lu_3 (ReLU) (None, 64, 48, 96) 0 bn2b_branch2a[0][0] __________________________________________________________________________________________________ res2b_branch2b (Conv2D) (None, 64, 48, 96) 36928 re_lu_3[0][0] __________________________________________________________________________________________________ bn2b_branch2b (BatchNormalizati (None, 64, 48, 96) 256 res2b_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_1 (TensorFlowOp [(None, 64, 48, 96)] 0 re_lu_2[0][0] bn2b_branch2b[0][0] __________________________________________________________________________________________________ re_lu_4 (ReLU) (None, 64, 48, 96) 0 tf_op_layer_add_1[0][0] __________________________________________________________________________________________________ res3a_branch2a (Conv2D) (None, 128, 24, 48) 73856 re_lu_4[0][0] __________________________________________________________________________________________________ bn3a_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2a[0][0] __________________________________________________________________________________________________ re_lu_5 (ReLU) (None, 128, 24, 48) 0 bn3a_branch2a[0][0] __________________________________________________________________________________________________ res3a_branch1 (Conv2D) (None, 128, 24, 48) 8320 re_lu_4[0][0] __________________________________________________________________________________________________ res3a_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_5[0][0] __________________________________________________________________________________________________ bn3a_branch1 (BatchNormalizatio (None, 128, 24, 48) 512 res3a_branch1[0][0] __________________________________________________________________________________________________ bn3a_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3a_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_2 (TensorFlowOp [(None, 128, 24, 48) 0 bn3a_branch1[0][0] bn3a_branch2b[0][0] __________________________________________________________________________________________________ re_lu_6 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_2[0][0] __________________________________________________________________________________________________ res3b_branch2a (Conv2D) (None, 128, 24, 48) 147584 re_lu_6[0][0] __________________________________________________________________________________________________ bn3b_branch2a (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2a[0][0] __________________________________________________________________________________________________ re_lu_7 (ReLU) (None, 128, 24, 48) 0 bn3b_branch2a[0][0] __________________________________________________________________________________________________ res3b_branch2b (Conv2D) (None, 128, 24, 48) 147584 re_lu_7[0][0] __________________________________________________________________________________________________ bn3b_branch2b (BatchNormalizati (None, 128, 24, 48) 512 res3b_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_3 (TensorFlowOp [(None, 128, 24, 48) 0 re_lu_6[0][0] bn3b_branch2b[0][0] __________________________________________________________________________________________________ re_lu_8 (ReLU) (None, 128, 24, 48) 0 tf_op_layer_add_3[0][0] __________________________________________________________________________________________________ res4a_branch2a (Conv2D) (None, 256, 12, 24) 295168 re_lu_8[0][0] __________________________________________________________________________________________________ bn4a_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2a[0][0] __________________________________________________________________________________________________ re_lu_9 (ReLU) (None, 256, 12, 24) 0 bn4a_branch2a[0][0] __________________________________________________________________________________________________ res4a_branch1 (Conv2D) (None, 256, 12, 24) 33024 re_lu_8[0][0] __________________________________________________________________________________________________ res4a_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_9[0][0] __________________________________________________________________________________________________ bn4a_branch1 (BatchNormalizatio (None, 256, 12, 24) 1024 res4a_branch1[0][0] __________________________________________________________________________________________________ bn4a_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4a_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_4 (TensorFlowOp [(None, 256, 12, 24) 0 bn4a_branch1[0][0] bn4a_branch2b[0][0] __________________________________________________________________________________________________ re_lu_10 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_4[0][0] __________________________________________________________________________________________________ res4b_branch2a (Conv2D) (None, 256, 12, 24) 590080 re_lu_10[0][0] __________________________________________________________________________________________________ bn4b_branch2a (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2a[0][0] __________________________________________________________________________________________________ re_lu_11 (ReLU) (None, 256, 12, 24) 0 bn4b_branch2a[0][0] __________________________________________________________________________________________________ res4b_branch2b (Conv2D) (None, 256, 12, 24) 590080 re_lu_11[0][0] __________________________________________________________________________________________________ bn4b_branch2b (BatchNormalizati (None, 256, 12, 24) 1024 res4b_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_5 (TensorFlowOp [(None, 256, 12, 24) 0 re_lu_10[0][0] bn4b_branch2b[0][0] __________________________________________________________________________________________________ re_lu_12 (ReLU) (None, 256, 12, 24) 0 tf_op_layer_add_5[0][0] __________________________________________________________________________________________________ res5a_branch2a (Conv2D) (None, 300, 12, 24) 691500 re_lu_12[0][0] __________________________________________________________________________________________________ bn5a_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2a[0][0] __________________________________________________________________________________________________ re_lu_13 (ReLU) (None, 300, 12, 24) 0 bn5a_branch2a[0][0] __________________________________________________________________________________________________ res5a_branch1 (Conv2D) (None, 300, 12, 24) 77100 re_lu_12[0][0] __________________________________________________________________________________________________ res5a_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_13[0][0] __________________________________________________________________________________________________ bn5a_branch1 (BatchNormalizatio (None, 300, 12, 24) 1200 res5a_branch1[0][0] __________________________________________________________________________________________________ bn5a_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5a_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_6 (TensorFlowOp [(None, 300, 12, 24) 0 bn5a_branch1[0][0] bn5a_branch2b[0][0] __________________________________________________________________________________________________ re_lu_14 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_6[0][0] __________________________________________________________________________________________________ res5b_branch2a (Conv2D) (None, 300, 12, 24) 810300 re_lu_14[0][0] __________________________________________________________________________________________________ bn5b_branch2a (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2a[0][0] __________________________________________________________________________________________________ re_lu_15 (ReLU) (None, 300, 12, 24) 0 bn5b_branch2a[0][0] __________________________________________________________________________________________________ res5b_branch2b (Conv2D) (None, 300, 12, 24) 810300 re_lu_15[0][0] __________________________________________________________________________________________________ bn5b_branch2b (BatchNormalizati (None, 300, 12, 24) 1200 res5b_branch2b[0][0] __________________________________________________________________________________________________ tf_op_layer_add_7 (TensorFlowOp [(None, 300, 12, 24) 0 re_lu_14[0][0] bn5b_branch2b[0][0] __________________________________________________________________________________________________ re_lu_16 (ReLU) (None, 300, 12, 24) 0 tf_op_layer_add_7[0][0] __________________________________________________________________________________________________ permute_feature (Permute) (None, 24, 12, 300) 0 re_lu_16[0][0] __________________________________________________________________________________________________ flatten_feature (Reshape) (None, 24, 3600) 0 permute_feature[0][0] __________________________________________________________________________________________________ lstm (LSTM) (None, 24, 512) 8423424 flatten_feature[0][0] __________________________________________________________________________________________________ td_dense (TimeDistributed) (None, 24, 36) 18468 lstm[0][0] __________________________________________________________________________________________________ softmax (Softmax) (None, 24, 36) 0 td_dense[0][0] ================================================================================================== Total params: 14,432,480 Trainable params: 14,424,872 Non-trainable params: 7,608 __________________________________________________________________________________________________ 2022-01-25 04:32:48,714 [INFO] __main__: Number of images in the training dataset: 111 2022-01-25 04:32:48,714 [INFO] __main__: Number of images in the validation dataset: 110 Epoch 1/24 3/4 [=====================>........] - ETA: 3s - loss: 71.1704d7988858fa82:48:74 [0] NCCL INFO Bootstrap : Using lo:127.0.0.1<0> d7988858fa82:48:74 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation d7988858fa82:48:74 [0] NCCL INFO NET/IB : No device found. d7988858fa82:48:74 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0> d7988858fa82:48:74 [0] NCCL INFO Using network Socket NCCL version 2.9.9+cuda11.3 d7988858fa82:48:74 [0] NCCL INFO Channel 00/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 01/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 02/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 03/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 04/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 05/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 06/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 07/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 08/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 09/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 10/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 11/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 12/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 13/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 14/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 15/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 16/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 17/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 18/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 19/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 20/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 21/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 22/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 23/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 24/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 25/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 26/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 27/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 28/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 29/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 30/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Channel 31/32 : 0 d7988858fa82:48:74 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 d7988858fa82:48:74 [0] NCCL INFO Connected all rings d7988858fa82:48:74 [0] NCCL INFO Connected all trees d7988858fa82:48:74 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer d7988858fa82:48:74 [0] NCCL INFO comm 0x7f0e333af310 rank 0 nranks 1 cudaDev 0 busId 1000 - Init COMPLETE 4/4 [==============================] - 10s 3s/step - loss: 71.0026 Epoch 2/24 4/4 [==============================] - 1s 241ms/step - loss: 66.1190 Epoch 3/24 4/4 [==============================] - 1s 201ms/step - loss: 51.3621 Epoch 4/24 4/4 [==============================] - 1s 198ms/step - loss: 35.5847 Epoch 5/24 3/4 [=====================>........] - ETA: 0s - loss: 29.0931 Epoch 00005: saving model to /workspace/tlt-experiments/lprnet/experiment_dir_unpruned/weights/lprnet_epoch-05.tlt ******************************************* Accuracy: 0 / 110 0.0 ******************************************* 4/4 [==============================] - 8s 2s/step - loss: 29.1470 Epoch 6/24 4/4 [==============================] - 1s 188ms/step - loss: 29.0384 Epoch 7/24 4/4 [==============================] - 1s 186ms/step - loss: 29.7068 Epoch 8/24 4/4 [==============================] - 1s 187ms/step - loss: 30.0909 Epoch 9/24 4/4 [==============================] - 1s 208ms/step - loss: 29.9499 Epoch 10/24 3/4 [=====================>........] - ETA: 0s - loss: 29.6294 Epoch 00010: saving model to /workspace/tlt-experiments/lprnet/experiment_dir_unpruned/weights/lprnet_epoch-10.tlt ******************************************* Accuracy: 0 / 110 0.0 ******************************************* 4/4 [==============================] - 4s 1s/step - loss: 29.2358 Epoch 11/24 4/4 [==============================] - 1s 208ms/step - loss: 28.6399 Epoch 12/24 4/4 [==============================] - 1s 192ms/step - loss: 28.3016 Epoch 13/24 4/4 [==============================] - 1s 197ms/step - loss: 27.9940 Epoch 14/24 4/4 [==============================] - 1s 192ms/step - loss: 27.8672 Epoch 15/24 3/4 [=====================>........] - ETA: 0s - loss: 27.7885 Epoch 00015: saving model to /workspace/tlt-experiments/lprnet/experiment_dir_unpruned/weights/lprnet_epoch-15.tlt ******************************************* Accuracy: 0 / 110 0.0 ******************************************* 4/4 [==============================] - 4s 1s/step - loss: 27.9495 Epoch 16/24 4/4 [==============================] - 1s 189ms/step - loss: 27.8031 Epoch 17/24 4/4 [==============================] - 1s 190ms/step - loss: 27.5886 Epoch 18/24 4/4 [==============================] - 1s 201ms/step - loss: 27.6862 Epoch 19/24 4/4 [==============================] - 1s 194ms/step - loss: 27.6148 Epoch 20/24 3/4 [=====================>........] - ETA: 0s - loss: 27.7921 Epoch 00020: saving model to /workspace/tlt-experiments/lprnet/experiment_dir_unpruned/weights/lprnet_epoch-20.tlt ******************************************* Accuracy: 0 / 110 0.0 ******************************************* 4/4 [==============================] - 4s 1s/step - loss: 27.5755 Epoch 21/24 4/4 [==============================] - 1s 190ms/step - loss: 27.6340 Epoch 22/24 4/4 [==============================] - 1s 188ms/step - loss: 27.6300 Epoch 23/24 4/4 [==============================] - 1s 200ms/step - loss: 27.5241 Epoch 24/24 3/4 [=====================>........] - ETA: 0s - loss: 27.7694 Epoch 00024: saving model to /workspace/tlt-experiments/lprnet/experiment_dir_unpruned/weights/lprnet_epoch-24.tlt 4/4 [==============================] - 2s 434ms/step - loss: 27.5343 ******************************************* Accuracy: 0 / 110 0.0 ******************************************* 2022-01-25 10:03:39,366 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.