Detectnet_v2: Assertion Error while training and validation

neo21995 · October 1, 2021, 6:27am

Getting the assertion error while training detectnet v2 model, i have set validation after every 10th epoch, when it comes to validation epochs it is giving the below error.

2021-09-30 09:09:32,836 [INFO] root: Registry: [‘nvcr.io’]
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:68: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:68: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-09-30 09:09:41,205 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-09-30 09:09:41,205 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-09-30 09:09:41,812 [INFO] iva.common.logging.logging: Log file already exists at /root/nvidia/cv_samples_v1.2.0/detectnet_v2/experiment_dir_unpruned/status.json
2021-09-30 09:09:41,812 [INFO] main: Loading experiment spec at /root/nvidia/cv_samples_v1.2.0/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt.
2021-09-30 09:09:41,814 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /root/nvidia/cv_samples_v1.2.0/detectnet_v2/specs/detectnet_v2_train_resnet18_kitti.txt
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2021-09-30 09:09:42,149 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2021-09-30 09:09:42,150 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-09-30 09:09:42,153 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-09-30 09:09:42,251 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-09-30 09:09:42,252 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-09-30 09:09:42,274 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2021-09-30 09:09:43,474 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2021-09-30 09:09:43,723 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2021-09-30 09:09:43,723 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2021-09-30 09:09:44,072 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2021-09-30 09:09:51,030 [INFO] iva.detectnet_v2.objectives.bbox_objective: Default L1 loss function will be used.

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 384, 1248) 0

conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]

bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]

activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]

block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]

block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]

block_1a_relu_1 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]

block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 block_1a_relu_1[0][0]

block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]

block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]

block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]

add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]

block_1a_relu (Activation) (None, 64, 96, 312) 0 add_1[0][0]

block_1b_conv_1 (Conv2D) (None, 64, 96, 312) 36928 block_1a_relu[0][0]

block_1b_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_1[0][0]

block_1b_relu_1 (Activation) (None, 64, 96, 312) 0 block_1b_bn_1[0][0]

block_1b_conv_2 (Conv2D) (None, 64, 96, 312) 36928 block_1b_relu_1[0][0]

block_1b_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1b_conv_2[0][0]

add_2 (Add) (None, 64, 96, 312) 0 block_1b_bn_2[0][0]
block_1a_relu[0][0]

block_1b_relu (Activation) (None, 64, 96, 312) 0 add_2[0][0]

block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 block_1b_relu[0][0]

block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]

block_2a_relu_1 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]

block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 block_2a_relu_1[0][0]

block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 block_1b_relu[0][0]

block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]

block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]

add_3 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]

block_2a_relu (Activation) (None, 128, 48, 156) 0 add_3[0][0]

block_2b_conv_1 (Conv2D) (None, 128, 48, 156) 147584 block_2a_relu[0][0]

block_2b_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_1[0][0]

block_2b_relu_1 (Activation) (None, 128, 48, 156) 0 block_2b_bn_1[0][0]

block_2b_conv_2 (Conv2D) (None, 128, 48, 156) 147584 block_2b_relu_1[0][0]

block_2b_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2b_conv_2[0][0]

add_4 (Add) (None, 128, 48, 156) 0 block_2b_bn_2[0][0]
block_2a_relu[0][0]

block_2b_relu (Activation) (None, 128, 48, 156) 0 add_4[0][0]

block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 block_2b_relu[0][0]

block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]

block_3a_relu_1 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]

block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 block_3a_relu_1[0][0]

block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 block_2b_relu[0][0]

block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]

block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]

add_5 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]

block_3a_relu (Activation) (None, 256, 24, 78) 0 add_5[0][0]

block_3b_conv_1 (Conv2D) (None, 256, 24, 78) 590080 block_3a_relu[0][0]

block_3b_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_1[0][0]

block_3b_relu_1 (Activation) (None, 256, 24, 78) 0 block_3b_bn_1[0][0]

block_3b_conv_2 (Conv2D) (None, 256, 24, 78) 590080 block_3b_relu_1[0][0]

block_3b_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3b_conv_2[0][0]

add_6 (Add) (None, 256, 24, 78) 0 block_3b_bn_2[0][0]
block_3a_relu[0][0]

block_3b_relu (Activation) (None, 256, 24, 78) 0 add_6[0][0]

block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 block_3b_relu[0][0]

block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]

block_4a_relu_1 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]

block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 block_4a_relu_1[0][0]

block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 block_3b_relu[0][0]

block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]

block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]

add_7 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]

block_4a_relu (Activation) (None, 512, 24, 78) 0 add_7[0][0]

block_4b_conv_1 (Conv2D) (None, 512, 24, 78) 2359808 block_4a_relu[0][0]

block_4b_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_1[0][0]

block_4b_relu_1 (Activation) (None, 512, 24, 78) 0 block_4b_bn_1[0][0]

block_4b_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 block_4b_relu_1[0][0]

block_4b_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4b_conv_2[0][0]

add_8 (Add) (None, 512, 24, 78) 0 block_4b_bn_2[0][0]
block_4a_relu[0][0]

block_4b_relu (Activation) (None, 512, 24, 78) 0 add_8[0][0]

output_bbox (Conv2D) (None, 16, 24, 78) 8208 block_4b_relu[0][0]

output_cov (Conv2D) (None, 4, 24, 78) 2052 block_4b_relu[0][0]

Total params: 11,205,588
Trainable params: 11,195,860
Non-trainable params: 9,728

2021-09-30 09:09:51,071 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-09-30 09:09:51,071 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-09-30 09:09:51,072 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-09-30 09:09:51,072 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2021-09-30 09:09:51,072 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 3599, number of sources: 1, batch size per gpu: 1, steps: 3599

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

2021-09-30 09:09:51,115 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0ef0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0ef0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:51,166 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0ef0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0ef0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:51,187 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-09-30 09:09:51,455 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2021-09-30 09:09:51,463 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-09-30 09:09:51,463 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f63e8317518>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f63e8317518>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:51,477 [WARNING] tensorflow: Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f63e8317518>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f63e8317518>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:51,870 [INFO] main: Found 3599 samples in training set
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

2021-09-30 09:09:51,984 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:89: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

2021-09-30 09:09:52,110 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:89: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

2021-09-30 09:09:52,127 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.

2021-09-30 09:09:52,303 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2021-09-30 09:09:52,367 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py:587: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

2021-09-30 09:09:52,380 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py:587: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

2021-09-30 09:09:54,290 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-09-30 09:09:54,290 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-09-30 09:09:54,290 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-09-30 09:09:54,290 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2021-09-30 09:09:54,290 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 399, number of sources: 1, batch size per gpu: 1, steps: 399
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0f60>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0f60>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:54,302 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0f60>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f64094b0f60>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:54,324 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-09-30 09:09:54,601 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2021-09-30 09:09:54,606 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-09-30 09:09:54,607 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f636849a550>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f636849a550>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:54,622 [WARNING] tensorflow: Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f636849a550>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f636849a550>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-09-30 09:09:54,892 [INFO] main: Found 399 samples in validation set
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py:40: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.

2021-09-30 09:09:55,534 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py:40: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.

2021-09-30 09:09:56,769 [INFO] main: Checkpoint interval: 100
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:108: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

2021-09-30 09:09:56,769 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:108: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

2021-09-30 09:09:56,769 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

2021-09-30 09:09:56,770 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

2021-09-30 09:09:56,771 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.

2021-09-30 09:09:56,773 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.

2021-09-30 09:09:56,774 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:73: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.

2021-09-30 09:09:56,774 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:73: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.

INFO:tensorflow:Create CheckpointSaverHook.
2021-09-30 09:09:56,774 [INFO] tensorflow: Create CheckpointSaverHook.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:99: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.

2021-09-30 09:09:56,774 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:99: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

2021-09-30 09:09:59,359 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.

INFO:tensorflow:Graph was finalized.
2021-09-30 09:10:00,412 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp60ihwsqh/model.ckpt-0
2021-09-30 09:10:00,875 [INFO] tensorflow: Restoring parameters from /tmp/tmp60ihwsqh/model.ckpt-0
INFO:tensorflow:Running local_init_op.
2021-09-30 09:10:02,197 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-09-30 09:10:02,849 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2021-09-30 09:10:08,671 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.09854018, step = 0
2021-09-30 09:10:32,798 [INFO] tensorflow: epoch = 0.0, learning_rate = 4.9999994e-06, loss = 0.09854018, step = 0
2021-09-30 09:10:32,801 [INFO] iva.detectnet_v2.tfhooks.task_progress_monitor_hook: Epoch 0/50: loss: 0.09854 learning rate: 0.00000 Time taken: 0:00:00 ETA: 0:00:00
2021-09-30 09:10:32,801 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.149
2021-09-30 09:10:36,592 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 2.383
INFO:tensorflow:epoch = 0.011947763267574326, learning_rate = 5.0553253e-06, loss = 0.072210506, step = 43 (5.114 sec)
2021-09-30 09:10:37,912 [INFO] tensorflow: epoch = 0.011947763267574326, learning_rate = 5.0553253e-06, loss = 0.072210506, step = 43 (5.114 sec)
2021-09-30 09:10:38,329 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.396
2021-09-30 09:10:40,092 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.177
2021-09-30 09:10:41,826 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.423
INFO:tensorflow:epoch = 0.03223117532647957, learning_rate = 5.150654e-06, loss = 0.047179792, step = 116 (5.113 sec)
2021-09-30 09:10:43,025 [INFO] tensorflow: epoch = 0.03223117532647957, learning_rate = 5.150654e-06, loss = 0.047179792, step = 116 (5.113 sec)
2021-09-30 09:10:43,587 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.196
2021-09-30 09:10:45,351 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.179
2021-09-30 09:10:47,112 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.201
INFO:tensorflow:epoch = 0.052236732425673796, learning_rate = 5.2464397e-06, loss = 0.028572386, step = 188 (5.074 sec)
2021-09-30 09:10:48,099 [INFO] tensorflow: epoch = 0.052236732425673796, learning_rate = 5.2464397e-06, loss = 0.028572386, step = 188 (5.074 sec)
2021-09-30 09:10:48,865 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.256
2021-09-30 09:10:50,615 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.288
2021-09-30 09:10:52,373 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.230
INFO:tensorflow:epoch = 0.07252014448457904, learning_rate = 5.345372e-06, loss = 0.01838489, step = 261 (5.123 sec)
2021-09-30 09:10:53,222 [INFO] tensorflow: epoch = 0.07252014448457904, learning_rate = 5.345372e-06, loss = 0.01838489, step = 261 (5.123 sec)
2021-09-30 09:10:54,140 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.144
2021-09-30 09:10:55,902 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.196
2021-09-30 09:10:57,673 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.116
INFO:tensorflow:epoch = 0.09252570158377327, learning_rate = 5.4447787e-06, loss = 0.013630683, step = 333 (5.096 sec)
2021-09-30 09:10:58,317 [INFO] tensorflow: epoch = 0.09252570158377327, learning_rate = 5.4447787e-06, loss = 0.013630683, step = 333 (5.096 sec)
2021-09-30 09:10:59,444 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.123
INFO:tensorflow:global_step/sec: 13.1248
2021-09-30 09:11:00,153 [INFO] tensorflow: global_step/sec: 13.1248
2021-09-30 09:11:01,241 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.910
2021-09-30 09:11:03,014 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.106
INFO:tensorflow:epoch = 0.11253125868296748, learning_rate = 5.546034e-06, loss = 0.010261499, step = 405 (5.120 sec)
2021-09-30 09:11:03,437 [INFO] tensorflow: epoch = 0.11253125868296748, learning_rate = 5.546034e-06, loss = 0.010261499, step = 405 (5.120 sec)
2021-09-30 09:11:04,780 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.160
2021-09-30 09:11:06,564 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.015
2021-09-30 09:11:08,341 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.068
INFO:tensorflow:epoch = 0.1325368157821617, learning_rate = 5.649172e-06, loss = 0.009401453, step = 477 (5.119 sec)
2021-09-30 09:11:08,556 [INFO] tensorflow: epoch = 0.1325368157821617, learning_rate = 5.649172e-06, loss = 0.009401453, step = 477 (5.119 sec)
2021-09-30 09:11:10,123 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.036
2021-09-30 09:11:11,893 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.125
INFO:tensorflow:epoch = 0.15254237288135591, learning_rate = 5.754223e-06, loss = 0.007666658, step = 549 (5.129 sec)
2021-09-30 09:11:13,685 [INFO] tensorflow: epoch = 0.15254237288135591, learning_rate = 5.754223e-06, loss = 0.007666658, step = 549 (5.129 sec)
2021-09-30 09:11:13,685 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.948
2021-09-30 09:11:15,469 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.017
2021-09-30 09:11:17,243 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.095
INFO:tensorflow:epoch = 0.17254792998055013, learning_rate = 5.8612327e-06, loss = 0.004857765, step = 621 (5.140 sec)
2021-09-30 09:11:18,825 [INFO] tensorflow: epoch = 0.17254792998055013, learning_rate = 5.8612327e-06, loss = 0.004857765, step = 621 (5.140 sec)
2021-09-30 09:11:19,040 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.915
2021-09-30 09:11:20,834 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.936
2021-09-30 09:11:22,624 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.967
INFO:tensorflow:epoch = 0.19227563212003332, learning_rate = 5.968707e-06, loss = 0.004560452, step = 692 (5.082 sec)
2021-09-30 09:11:23,907 [INFO] tensorflow: epoch = 0.19227563212003332, learning_rate = 5.968707e-06, loss = 0.004560452, step = 692 (5.082 sec)
2021-09-30 09:11:24,407 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.024
INFO:tensorflow:global_step/sec: 14.0104
2021-09-30 09:11:25,776 [INFO] tensorflow: global_step/sec: 14.0104
2021-09-30 09:11:26,205 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.912
2021-09-30 09:11:27,984 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.051
INFO:tensorflow:epoch = 0.2120033342595165, learning_rate = 6.0781463e-06, loss = 0.0043286113, step = 763 (5.078 sec)
2021-09-30 09:11:28,986 [INFO] tensorflow: epoch = 0.2120033342595165, learning_rate = 6.0781463e-06, loss = 0.0043286113, step = 763 (5.078 sec)
2021-09-30 09:11:29,769 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 14.008
2021-09-30 09:11:31,559 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.966
2021-09-30 09:11:33,353 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.944
INFO:tensorflow:epoch = 0.2317310363989997, learning_rate = 6.1895976e-06, loss = 0.0040201526, step = 834 (5.088 sec)
2021-09-30 09:11:34,074 [INFO] tensorflow: epoch = 0.2317310363989997, learning_rate = 6.1895976e-06, loss = 0.0040201526, step = 834 (5.088 sec)

2021-09-30 09:11:35,171 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.752
2021-09-30 09:11:36,976 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.849
2021-09-30 09:11:38,779 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.871
INFO:tensorflow:epoch = 0.2514587385384829, learning_rate = 6.303087e-06, loss = 0.002848207, step = 905 (5.133 sec)
2021-09-30 09:11:39,207 [INFO] tensorflow: epoch = 0.2514587385384829, learning_rate = 6.303087e-06, loss = 0.002848207, step = 905 (5.133 sec)
2021-09-30 09:11:40,581 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.877
2021-09-30 09:11:42,393 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.801
2021-09-30 09:11:44,211 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.747
INFO:tensorflow:epoch = 0.27090858571825505, learning_rate = 6.4170226e-06, loss = 0.0030826079, step = 975 (5.076 sec)
2021-09-30 09:11:44,284 [INFO] tensorflow: epoch = 0.27090858571825505, learning_rate = 6.4170226e-06, loss = 0.0030826079, step = 975 (5.076 sec)
2021-09-30 09:11:46,034 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.721
2021-09-30 09:11:47,862 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.674
INFO:tensorflow:epoch = 0.2903584328980272, learning_rate = 6.533012e-06, loss = 0.002698243, step = 1045 (5.108 sec)
2021-09-30 09:11:49,392 [INFO] tensorflow: epoch = 0.2903584328980272, learning_rate = 6.533012e-06, loss = 0.002698243, step = 1045 (5.108 sec)
2021-09-30 09:11:49,685 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.714
2021-09-30 09:11:51,504 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.747
INFO:tensorflow:global_step/sec: 13.8365
2021-09-30 09:11:51,722 [INFO] tensorflow: global_step/sec: 13.8365
2021-09-30 09:11:53,330 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.697
INFO:tensorflow:epoch = 0.30980828007779937, learning_rate = 6.6510975e-06, loss = 0.0031609733, step = 1115 (5.115 sec)
2021-09-30 09:11:54,506 [INFO] tensorflow: epoch = 0.30980828007779937, learning_rate = 6.6510975e-06, loss = 0.0031609733, step = 1115 (5.115 sec)
2021-09-30 09:11:55,176 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.542
2021-09-30 09:11:57,023 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.540
2021-09-30 09:11:58,875 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.503
INFO:tensorflow:epoch = 0.3289802722978605, learning_rate = 6.7695873e-06, loss = 0.00218269, step = 1184 (5.108 sec)
2021-09-30 09:11:59,614 [INFO] tensorflow: epoch = 0.3289802722978605, learning_rate = 6.7695873e-06, loss = 0.00218269, step = 1184 (5.108 sec)
2021-09-30 09:12:00,740 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.404
2021-09-30 09:12:02,613 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.348
2021-09-30 09:12:04,477 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.417
INFO:tensorflow:epoch = 0.3478744095582106, learning_rate = 6.8884206e-06, loss = 0.0021372773, step = 1252 (5.085 sec)
2021-09-30 09:12:04,699 [INFO] tensorflow: epoch = 0.3478744095582106, learning_rate = 6.8884206e-06, loss = 0.0021372773, step = 1252 (5.085 sec)
2021-09-30 09:12:06,331 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.489
2021-09-30 09:12:08,195 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.409
INFO:tensorflow:epoch = 0.3670464017782717, learning_rate = 7.0111378e-06, loss = 0.0047664717, step = 1321 (5.141 sec)
2021-09-30 09:12:09,840 [INFO] tensorflow: epoch = 0.3670464017782717, learning_rate = 7.0111378e-06, loss = 0.0047664717, step = 1321 (5.141 sec)
2021-09-30 09:12:10,062 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.391
2021-09-30 09:12:11,900 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.608
2021-09-30 09:12:13,762 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.427
INFO:tensorflow:epoch = 0.38621839399833285, learning_rate = 7.136042e-06, loss = 0.0022797526, step = 1390 (5.103 sec)
2021-09-30 09:12:14,943 [INFO] tensorflow: epoch = 0.38621839399833285, learning_rate = 7.136042e-06, loss = 0.0022797526, step = 1390 (5.103 sec)
2021-09-30 09:12:15,609 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.540
2021-09-30 09:12:17,470 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.433
INFO:tensorflow:global_step/sec: 13.4781
2021-09-30 09:12:18,358 [INFO] tensorflow: global_step/sec: 13.4781
2021-09-30 09:12:19,344 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.342
INFO:tensorflow:epoch = 0.40539038621839396, learning_rate = 7.2631706e-06, loss = 0.0021571326, step = 1459 (5.137 sec)
2021-09-30 09:12:20,080 [INFO] tensorflow: epoch = 0.40539038621839396, learning_rate = 7.2631706e-06, loss = 0.0021571326, step = 1459 (5.137 sec)
2021-09-30 09:12:21,196 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.501
2021-09-30 09:12:23,058 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.427
2021-09-30 09:12:24,921 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.423
INFO:tensorflow:epoch = 0.42456237843845507, learning_rate = 7.3925576e-06, loss = 0.0037793969, step = 1528 (5.142 sec)
2021-09-30 09:12:25,222 [INFO] tensorflow: epoch = 0.42456237843845507, learning_rate = 7.3925576e-06, loss = 0.0037793969, step = 1528 (5.142 sec)
2021-09-30 09:12:26,788 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.392
2021-09-30 09:12:28,665 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.323
INFO:tensorflow:epoch = 0.4434565156988052, learning_rate = 7.522334e-06, loss = 0.0023051952, step = 1596 (5.091 sec)
2021-09-30 09:12:30,313 [INFO] tensorflow: epoch = 0.4434565156988052, learning_rate = 7.522334e-06, loss = 0.0023051952, step = 1596 (5.091 sec)
2021-09-30 09:12:30,539 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.339
2021-09-30 09:12:32,426 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.250
2021-09-30 09:12:34,312 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.263
INFO:tensorflow:epoch = 0.4623506529591553, learning_rate = 7.654381e-06, loss = 0.002160594, step = 1664 (5.138 sec)
2021-09-30 09:12:35,452 [INFO] tensorflow: epoch = 0.4623506529591553, learning_rate = 7.654381e-06, loss = 0.002160594, step = 1664 (5.138 sec)
2021-09-30 09:12:36,204 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.214
2021-09-30 09:12:38,098 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.199
2021-09-30 09:12:39,994 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.188
INFO:tensorflow:epoch = 0.4812447902195054, learning_rate = 7.7887535e-06, loss = 0.0019203301, step = 1732 (5.150 sec)
2021-09-30 09:12:40,602 [INFO] tensorflow: epoch = 0.4812447902195054, learning_rate = 7.7887535e-06, loss = 0.0019203301, step = 1732 (5.150 sec)
2021-09-30 09:12:41,906 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.081
2021-09-30 09:12:43,806 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.156
INFO:tensorflow:global_step/sec: 13.2767
2021-09-30 09:12:45,398 [INFO] tensorflow: global_step/sec: 13.2767
INFO:tensorflow:epoch = 0.49986107252014444, learning_rate = 7.9234505e-06, loss = 0.0017554755, step = 1799 (5.098 sec)
2021-09-30 09:12:45,700 [INFO] tensorflow: epoch = 0.49986107252014444, learning_rate = 7.9234505e-06, loss = 0.0017554755, step = 1799 (5.098 sec)
2021-09-30 09:12:45,700 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.201
2021-09-30 09:12:47,592 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.220
2021-09-30 09:12:49,474 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.284
INFO:tensorflow:epoch = 0.5187552097804945, learning_rate = 8.06254e-06, loss = 0.0012976202, step = 1867 (5.149 sec)
2021-09-30 09:12:50,849 [INFO] tensorflow: epoch = 0.5187552097804945, learning_rate = 8.06254e-06, loss = 0.0012976202, step = 1867 (5.149 sec)
2021-09-30 09:12:51,389 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.059

2021-09-30 09:12:53,279 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.227
2021-09-30 09:12:55,154 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.333
INFO:tensorflow:epoch = 0.5376493470408447, learning_rate = 8.204077e-06, loss = 0.0012079268, step = 1935 (5.135 sec)
2021-09-30 09:12:55,984 [INFO] tensorflow: epoch = 0.5376493470408447, learning_rate = 8.204077e-06, loss = 0.0012079268, step = 1935 (5.135 sec)
2021-09-30 09:12:57,043 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.236
2021-09-30 09:12:58,927 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.272
2021-09-30 09:13:00,813 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.261
INFO:tensorflow:epoch = 0.5565434843011947, learning_rate = 8.348092e-06, loss = 0.001814571, step = 2003 (5.132 sec)
2021-09-30 09:13:01,116 [INFO] tensorflow: epoch = 0.5565434843011947, learning_rate = 8.348092e-06, loss = 0.001814571, step = 2003 (5.132 sec)
2021-09-30 09:13:02,700 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.247
2021-09-30 09:13:04,583 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.279
INFO:tensorflow:epoch = 0.5754376215615449, learning_rate = 8.494641e-06, loss = 0.0014310093, step = 2071 (5.116 sec)
2021-09-30 09:13:06,233 [INFO] tensorflow: epoch = 0.5754376215615449, learning_rate = 8.494641e-06, loss = 0.0014310093, step = 2071 (5.116 sec)
2021-09-30 09:13:06,458 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.338
2021-09-30 09:13:08,337 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.303
2021-09-30 09:13:10,222 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.268
INFO:tensorflow:epoch = 0.5943317588218949, learning_rate = 8.643757e-06, loss = 0.0013333114, step = 2139 (5.111 sec)
2021-09-30 09:13:11,344 [INFO] tensorflow: epoch = 0.5943317588218949, learning_rate = 8.643757e-06, loss = 0.0013333114, step = 2139 (5.111 sec)
2021-09-30 09:13:12,104 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.282
INFO:tensorflow:global_step/sec: 13.2572
2021-09-30 09:13:12,477 [INFO] tensorflow: global_step/sec: 13.2572
2021-09-30 09:13:13,993 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.236
2021-09-30 09:13:15,889 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.189
INFO:tensorflow:epoch = 0.612948041122534, learning_rate = 8.793241e-06, loss = 0.0013000339, step = 2206 (5.075 sec)
2021-09-30 09:13:16,418 [INFO] tensorflow: epoch = 0.612948041122534, learning_rate = 8.793241e-06, loss = 0.0013000339, step = 2206 (5.075 sec)
2021-09-30 09:13:17,771 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.290
2021-09-30 09:13:19,636 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.407
INFO:tensorflow:epoch = 0.631842178382884, learning_rate = 8.947606e-06, loss = 0.001094278, step = 2274 (5.081 sec)
2021-09-30 09:13:21,499 [INFO] tensorflow: epoch = 0.631842178382884, learning_rate = 8.947606e-06, loss = 0.001094278, step = 2274 (5.081 sec)
2021-09-30 09:13:21,499 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.417
2021-09-30 09:13:23,370 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.365
2021-09-30 09:13:25,255 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.261
INFO:tensorflow:epoch = 0.6507363156432342, learning_rate = 9.104672e-06, loss = 0.0010227972, step = 2342 (5.104 sec)
2021-09-30 09:13:26,603 [INFO] tensorflow: epoch = 0.6507363156432342, learning_rate = 9.104672e-06, loss = 0.0010227972, step = 2342 (5.104 sec)
2021-09-30 09:13:27,134 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.310
2021-09-30 09:13:29,007 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.351
2021-09-30 09:13:30,893 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.253
INFO:tensorflow:epoch = 0.6696304529035843, learning_rate = 9.264504e-06, loss = 0.0012531078, step = 2410 (5.123 sec)
2021-09-30 09:13:31,726 [INFO] tensorflow: epoch = 0.6696304529035843, learning_rate = 9.264504e-06, loss = 0.0012531078, step = 2410 (5.123 sec)
2021-09-30 09:13:32,791 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.176
2021-09-30 09:13:34,686 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.199
2021-09-30 09:13:36,573 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.247
INFO:tensorflow:epoch = 0.6885245901639344, learning_rate = 9.427134e-06, loss = 0.0037629707, step = 2478 (5.151 sec)
2021-09-30 09:13:36,876 [INFO] tensorflow: epoch = 0.6885245901639344, learning_rate = 9.427134e-06, loss = 0.0037629707, step = 2478 (5.151 sec)
2021-09-30 09:13:38,461 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.247
INFO:tensorflow:global_step/sec: 13.2774
2021-09-30 09:13:39,516 [INFO] tensorflow: global_step/sec: 13.2774
2021-09-30 09:13:40,350 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.232
INFO:tensorflow:epoch = 0.7074187274242845, learning_rate = 9.592627e-06, loss = 0.0023811185, step = 2546 (5.133 sec)
2021-09-30 09:13:42,009 [INFO] tensorflow: epoch = 0.7074187274242845, learning_rate = 9.592627e-06, loss = 0.0023811185, step = 2546 (5.133 sec)
2021-09-30 09:13:42,236 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.261
2021-09-30 09:13:44,124 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.244
2021-09-30 09:13:46,006 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.284
INFO:tensorflow:epoch = 0.7263128646846345, learning_rate = 9.7610155e-06, loss = 0.0012716708, step = 2614 (5.135 sec)
2021-09-30 09:13:47,144 [INFO] tensorflow: epoch = 0.7263128646846345, learning_rate = 9.7610155e-06, loss = 0.0012716708, step = 2614 (5.135 sec)
2021-09-30 09:13:47,892 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.260
2021-09-30 09:13:49,776 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.271
2021-09-30 09:13:51,653 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.324
INFO:tensorflow:epoch = 0.7452070019449847, learning_rate = 9.93237e-06, loss = 0.0011890521, step = 2682 (5.112 sec)
2021-09-30 09:13:52,256 [INFO] tensorflow: epoch = 0.7452070019449847, learning_rate = 9.93237e-06, loss = 0.0011890521, step = 2682 (5.112 sec)
2021-09-30 09:13:53,535 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.283
2021-09-30 09:13:55,407 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.353
2021-09-30 09:13:57,303 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.192
INFO:tensorflow:epoch = 0.7641011392053347, learning_rate = 1.0106723e-05, loss = 0.001395291, step = 2750 (5.123 sec)
2021-09-30 09:13:57,379 [INFO] tensorflow: epoch = 0.7641011392053347, learning_rate = 1.0106723e-05, loss = 0.001395291, step = 2750 (5.123 sec)
2021-09-30 09:13:59,190 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.249
2021-09-30 09:14:01,090 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.162
INFO:tensorflow:epoch = 0.7829952764656849, learning_rate = 1.0284146e-05, loss = 0.0020224205, step = 2818 (5.131 sec)
2021-09-30 09:14:02,510 [INFO] tensorflow: epoch = 0.7829952764656849, learning_rate = 1.0284146e-05, loss = 0.0020224205, step = 2818 (5.131 sec)
2021-09-30 09:14:02,960 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.370
2021-09-30 09:14:04,866 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.119
INFO:tensorflow:global_step/sec: 13.2562
2021-09-30 09:14:06,597 [INFO] tensorflow: global_step/sec: 13.2562
2021-09-30 09:14:06,747 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.291
INFO:tensorflow:epoch = 0.801889413726035, learning_rate = 1.0464672e-05, loss = 0.0020490424, step = 2886 (5.151 sec)
2021-09-30 09:14:07,661 [INFO] tensorflow: epoch = 0.801889413726035, learning_rate = 1.0464672e-05, loss = 0.0020490424, step = 2886 (5.151 sec)
2021-09-30 09:14:08,635 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.245
2021-09-30 09:14:10,531 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.184

2021-09-30 09:14:12,412 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.295
INFO:tensorflow:epoch = 0.820783550986385, learning_rate = 1.0648379e-05, loss = 0.0020841898, step = 2954 (5.121 sec)
2021-09-30 09:14:12,783 [INFO] tensorflow: epoch = 0.820783550986385, learning_rate = 1.0648379e-05, loss = 0.0020841898, step = 2954 (5.121 sec)
2021-09-30 09:14:14,303 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.220
2021-09-30 09:14:16,198 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.198
INFO:tensorflow:epoch = 0.839399833287024, learning_rate = 1.0832532e-05, loss = 0.0012533535, step = 3021 (5.074 sec)
2021-09-30 09:14:17,857 [INFO] tensorflow: epoch = 0.839399833287024, learning_rate = 1.0832532e-05, loss = 0.0012533535, step = 3021 (5.074 sec)
2021-09-30 09:14:18,086 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.242
2021-09-30 09:14:19,980 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.201
2021-09-30 09:14:21,861 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.291
INFO:tensorflow:epoch = 0.8582939705473742, learning_rate = 1.10226865e-05, loss = 0.0024614893, step = 3089 (5.129 sec)
2021-09-30 09:14:22,986 [INFO] tensorflow: epoch = 0.8582939705473742, learning_rate = 1.10226865e-05, loss = 0.0024614893, step = 3089 (5.129 sec)
2021-09-30 09:14:23,737 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.329
2021-09-30 09:14:25,609 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.362
2021-09-30 09:14:27,489 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.298
INFO:tensorflow:epoch = 0.8771881078077243, learning_rate = 1.121619e-05, loss = 0.0010079967, step = 3157 (5.104 sec)
2021-09-30 09:14:28,090 [INFO] tensorflow: epoch = 0.8771881078077243, learning_rate = 1.121619e-05, loss = 0.0010079967, step = 3157 (5.104 sec)
2021-09-30 09:14:29,389 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.156
2021-09-30 09:14:31,272 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.282
2021-09-30 09:14:33,147 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.335
INFO:tensorflow:epoch = 0.8960822450680744, learning_rate = 1.1413078e-05, loss = 0.0010068857, step = 3225 (5.133 sec)
2021-09-30 09:14:33,223 [INFO] tensorflow: epoch = 0.8960822450680744, learning_rate = 1.1413078e-05, loss = 0.0010068857, step = 3225 (5.133 sec)
INFO:tensorflow:global_step/sec: 13.2588
2021-09-30 09:14:33,674 [INFO] tensorflow: global_step/sec: 13.2588
2021-09-30 09:14:35,031 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.271
2021-09-30 09:14:36,918 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.254
INFO:tensorflow:epoch = 0.9149763823284245, learning_rate = 1.1613434e-05, loss = 0.0016679382, step = 3293 (5.127 sec)
2021-09-30 09:14:38,350 [INFO] tensorflow: epoch = 0.9149763823284245, learning_rate = 1.1613434e-05, loss = 0.0016679382, step = 3293 (5.127 sec)
2021-09-30 09:14:38,799 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.292
2021-09-30 09:14:40,674 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.330
2021-09-30 09:14:42,547 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.357
INFO:tensorflow:epoch = 0.9338705195887745, learning_rate = 1.1817296e-05, loss = 0.0011098281, step = 3361 (5.104 sec)
2021-09-30 09:14:43,454 [INFO] tensorflow: epoch = 0.9338705195887745, learning_rate = 1.1817296e-05, loss = 0.0011098281, step = 3361 (5.104 sec)
2021-09-30 09:14:44,434 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.247
2021-09-30 09:14:46,326 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.215
2021-09-30 09:14:48,220 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.205
INFO:tensorflow:epoch = 0.9527646568491247, learning_rate = 1.2024737e-05, loss = 0.0009869031, step = 3429 (5.144 sec)
2021-09-30 09:14:48,598 [INFO] tensorflow: epoch = 0.9527646568491247, learning_rate = 1.2024737e-05, loss = 0.0009869031, step = 3429 (5.144 sec)
2021-09-30 09:14:50,113 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.205
2021-09-30 09:14:51,994 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.296
INFO:tensorflow:epoch = 0.9716587941094748, learning_rate = 1.2235831e-05, loss = 0.0010173235, step = 3497 (5.119 sec)
2021-09-30 09:14:53,717 [INFO] tensorflow: epoch = 0.9716587941094748, learning_rate = 1.2235831e-05, loss = 0.0010173235, step = 3497 (5.119 sec)
2021-09-30 09:14:53,869 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.332
2021-09-30 09:14:55,749 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.303
2021-09-30 09:14:57,645 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.189
INFO:tensorflow:epoch = 0.9905529313698249, learning_rate = 1.2450618e-05, loss = 0.0017017268, step = 3565 (5.142 sec)
2021-09-30 09:14:58,859 [INFO] tensorflow: epoch = 0.9905529313698249, learning_rate = 1.2450618e-05, loss = 0.0017017268, step = 3565 (5.142 sec)
2021-09-30 09:14:59,539 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 13.196
INFO:tensorflow:global_step/sec: 13.2666
2021-09-30 09:15:00,734 [INFO] tensorflow: global_step/sec: 13.2666
f8f3acc47421:39:51 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
f8f3acc47421:39:51 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
f8f3acc47421:39:51 [0] NCCL INFO NET/IB : No device found.
f8f3acc47421:39:51 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.2<0>
f8f3acc47421:39:51 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
f8f3acc47421:39:51 [0] NCCL INFO Channel 00/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 01/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 02/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 03/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 04/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 05/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 06/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 07/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 08/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 09/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 10/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 11/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 12/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 13/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 14/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 15/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 16/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 17/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 18/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 19/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 20/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 21/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 22/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 23/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 24/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 25/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 26/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 27/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 28/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 29/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 30/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Channel 31/32 : 0
f8f3acc47421:39:51 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
f8f3acc47421:39:51 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
f8f3acc47421:39:51 [0] NCCL INFO comm 0x7f6400387390 rank 0 nranks 1 cudaDev 0 busId 40 - Init COMPLETE
2021-09-30 09:15:01,762 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 399, 0.00s/step

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 843, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 832, in
File “”, line 2, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 821, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 702, in run_experiment
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 638, in train_gridbox
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 154, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1426, in run
run_metadata=run_metadata))
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 79, in after_run
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py”, line 85, in validate
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/evaluation/evaluation.py”, line 166, in evaluate
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/postprocessing.py”, line 146, in cluster_predictions
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/cluster.py”, line 45, in cluster_predictions
AssertionError
2021-09-30 09:15:04,882 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Topic		Replies	Views
DetectNet v2 training error - "ValueError: The zipfile extracted was corrupt. Please check your key " TAO Toolkit	2	1015	October 12, 2021
Always get 0 or nan precision during training detectnet TAO Toolkit	6	715	October 12, 2021
Tao detectnet_v2 train failed with g_error_metadata.to_exception in autograph module TAO Toolkit tao	12	1424	January 10, 2022
0 map over 120 epoch on detectnet v2 pre-trained model TAO Toolkit	5	799	October 12, 2021
Error on tlt-training detectnet_v2? TAO Toolkit	6	506	October 12, 2021
Error while training using the detectnet_v2 notebook provided in the TAO toolkit with using the custom dataset TAO Toolkit computer-vision-cv , tao	16	1461	January 13, 2023
Run detectnet_v2.ipynb error with my own data TAO Toolkit tao	23	1465	March 4, 2022
Tensorflow Object Detection error while training Faster RCNN Deep Learning (Training & Inference)	0	1489	January 18, 2020
Training with TLT a detectnet_v2 resnet18 pre-trained model failed TAO Toolkit	2	631	October 12, 2021
TLT- detectnet_v2 moving average precision is 0 thoughout the training TAO Toolkit tensorrt	7	712	October 12, 2021

Detectnet_v2: Assertion Error while training and validation

Layer (type) Output Shape Param # Connected to

output_cov (Conv2D) (None, 4, 24, 78) 2052 block_4b_relu[0][0]

Related topics