Hi experts,
The training kicks off but the precision I got is always 0, can someone help me on this, I am really new to this field.
Thanks,
Kai
2021-06-15 20:40:56,001 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:43: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:67: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:67: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2021-06-15 12:41:04,768 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2021-06-15 12:41:04,768 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2021-06-15 12:41:05,442 [INFO] __main__: Loading experiment spec at /workspace/tlt-experiments/detectnet_v2/specs/detectnet_v2_retrain_trafficcamnet_car_kitti.txt.
2021-06-15 12:41:05,444 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/detectnet_v2/specs/detectnet_v2_retrain_trafficcamnet_car_kitti.txt
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
2021-06-15 12:41:05,600 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:107: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
2021-06-15 12:41:05,601 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:110: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
2021-06-15 12:41:05,604 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:113: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
2021-06-15 12:41:05,638 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2021-06-15 12:41:05,640 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
2021-06-15 12:41:06,641 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
2021-06-15 12:41:07,900 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
2021-06-15 12:41:07,900 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2021-06-15 12:41:08,207 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/objectives/bbox_objective.py:61: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
2021-06-15 12:41:12,277 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/objectives/bbox_objective.py:61: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
INFO:tensorflow:DriveNet default L1 loss function will be used.
2021-06-15 12:41:12,277 [INFO] tensorflow: DriveNet default L1 loss function will be used.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 544, 960) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 272, 480) 0 conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation) (None, 64, 136, 240) 0 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_1[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 136, 240) 0 block_1a_conv_2[0][0]
block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation) (None, 64, 136, 240) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation) (None, 64, 136, 240) 0 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 block_1a_relu[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 136, 240) 0 block_1b_conv_2[0][0]
block_1b_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation) (None, 64, 136, 240) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 68, 120) 73856 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation) (None, 128, 68, 120) 0 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320 block_1b_relu[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 68, 120) 0 block_2a_conv_2[0][0]
block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation) (None, 128, 68, 120) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation) (None, 128, 68, 120) 0 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 68, 120) 16512 block_2a_relu[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 68, 120) 0 block_2b_conv_2[0][0]
block_2b_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation) (None, 128, 68, 120) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 34, 60) 295168 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation) (None, 256, 34, 60) 0 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60) 33024 block_2b_relu[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 34, 60) 0 block_3a_conv_2[0][0]
block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation) (None, 256, 34, 60) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation) (None, 256, 34, 60) 0 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 34, 60) 65792 block_3a_relu[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 34, 60) 0 block_3b_conv_2[0][0]
block_3b_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation) (None, 256, 34, 60) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 34, 60) 1180160 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation) (None, 512, 34, 60) 0 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60) 131584 block_3b_relu[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 34, 60) 0 block_4a_conv_2[0][0]
block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation) (None, 512, 34, 60) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation) (None, 512, 34, 60) 0 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 34, 60) 262656 block_4a_relu[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 34, 60) 0 block_4b_conv_2[0][0]
block_4b_conv_shortcut[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation) (None, 512, 34, 60) 0 add_8[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 4, 34, 60) 2052 block_4b_relu[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 1, 34, 60) 513 block_4b_relu[0][0]
==================================================================================================
Total params: 11,527,557
Trainable params: 11,527,557
Non-trainable params: 0
__________________________________________________________________________________________________
2021-06-15 12:41:12,324 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-06-15 12:41:12,324 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-06-15 12:41:12,324 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-06-15 12:41:12,325 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2021-06-15 12:41:12,325 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 188, number of sources: 1, batch size per gpu: 4, steps: 47
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2021-06-15 12:41:12,370 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c748>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c748>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:12,413 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c748>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c748>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:12,441 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-06-15 12:41:12,793 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2021-06-15 12:41:12,801 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-06-15 12:41:12,801 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c227a90>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c227a90>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:12,821 [WARNING] tensorflow: Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c227a90>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c227a90>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:13,516 [INFO] __main__: Found 188 samples in training set
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.
2021-06-15 12:41:13,669 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/rasterizers/bbox_rasterizer.py:347: The name tf.bincount is deprecated. Please use tf.math.bincount instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:89: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
2021-06-15 12:41:13,828 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:89: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
2021-06-15 12:41:13,851 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/training_proto_utilities.py:36: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.
2021-06-15 12:41:13,956 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_functions.py:17: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
2021-06-15 12:41:13,970 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/cost_function/cost_auto_weight_hook.py:235: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py:574: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
2021-06-15 12:41:13,975 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py:574: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
2021-06-15 12:41:15,597 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2021-06-15 12:41:15,597 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2021-06-15 12:41:15,597 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2021-06-15 12:41:15,597 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2021-06-15 12:41:15,598 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 30, number of sources: 1, batch size per gpu: 4, steps: 8
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c7f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c7f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:15,612 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c7f0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.__call__ of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7fe565e4c7f0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:15,639 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2021-06-15 12:41:15,981 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2021-06-15 12:41:16,139 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2021-06-15 12:41:16,139 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c64f550>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c64f550>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:16,159 [WARNING] tensorflow: Entity <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c64f550>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.__call__ of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7fe54c64f550>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-06-15 12:41:16,667 [INFO] __main__: Found 30 samples in validation set
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py:40: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.
2021-06-15 12:41:17,127 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py:40: The name tf.summary.FileWriterCache is deprecated. Please use tf.compat.v1.summary.FileWriterCache instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:105: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.
2021-06-15 12:41:18,224 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py:105: The name tf.train.Scaffold is deprecated. Please use tf.compat.v1.train.Scaffold instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.
2021-06-15 12:41:18,224 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:14: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.
2021-06-15 12:41:18,225 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:15: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
2021-06-15 12:41:18,226 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/common/graph/initializers.py:16: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.
2021-06-15 12:41:18,229 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.
2021-06-15 12:41:18,229 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:74: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.
2021-06-15 12:41:18,230 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:74: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.
INFO:tensorflow:Create CheckpointSaverHook.
2021-06-15 12:41:18,230 [INFO] tensorflow: Create CheckpointSaverHook.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:100: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.
2021-06-15 12:41:18,230 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:100: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.
2021-06-15 12:41:18,231 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/utilities.py:140: The name tf.train.SingularMonitoredSession is deprecated. Please use tf.compat.v1.train.SingularMonitoredSession instead.
INFO:tensorflow:Graph was finalized.
2021-06-15 12:41:18,996 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2021-06-15 12:41:20,726 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-06-15 12:41:21,486 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2021-06-15 12:41:27,227 [INFO] tensorflow: Saving checkpoints for step-0.
INFO:tensorflow:epoch = 0.0, loss = 0.06009046, step = 0
2021-06-15 12:41:47,496 [INFO] tensorflow: epoch = 0.0, loss = 0.06009046, step = 0
2021-06-15 12:41:47,499 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 0/120: loss: 0.06009 Time taken: 0:00:00 ETA: 0:00:00
2021-06-15 12:41:47,499 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 0.535
INFO:tensorflow:global_step/sec: 0.819422
2021-06-15 12:41:52,379 [INFO] tensorflow: global_step/sec: 0.819422
INFO:tensorflow:epoch = 0.1276595744680851, loss = 0.060291, step = 6 (7.296 sec)
2021-06-15 12:41:54,792 [INFO] tensorflow: epoch = 0.1276595744680851, loss = 0.060291, step = 6 (7.296 sec)
INFO:tensorflow:global_step/sec: 1.23529
2021-06-15 12:41:55,617 [INFO] tensorflow: global_step/sec: 1.23529
INFO:tensorflow:global_step/sec: 2.31361
2021-06-15 12:41:57,346 [INFO] tensorflow: global_step/sec: 2.31361
INFO:tensorflow:global_step/sec: 2.4405
2021-06-15 12:41:58,985 [INFO] tensorflow: global_step/sec: 2.4405
INFO:tensorflow:epoch = 0.40425531914893614, loss = 0.06011029, step = 19 (5.486 sec)
2021-06-15 12:42:00,278 [INFO] tensorflow: epoch = 0.40425531914893614, loss = 0.06011029, step = 19 (5.486 sec)
INFO:tensorflow:global_step/sec: 2.23885
2021-06-15 12:42:00,772 [INFO] tensorflow: global_step/sec: 2.23885
INFO:tensorflow:global_step/sec: 2.36717
2021-06-15 12:42:02,462 [INFO] tensorflow: global_step/sec: 2.36717
2021-06-15 12:42:02,465 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 4.457
INFO:tensorflow:global_step/sec: 2.39177
2021-06-15 12:42:04,134 [INFO] tensorflow: global_step/sec: 2.39177
INFO:tensorflow:epoch = 0.6808510638297872, loss = 0.059858073, step = 32 (5.612 sec)
2021-06-15 12:42:05,890 [INFO] tensorflow: epoch = 0.6808510638297872, loss = 0.059858073, step = 32 (5.612 sec)
INFO:tensorflow:global_step/sec: 2.27592
2021-06-15 12:42:05,892 [INFO] tensorflow: global_step/sec: 2.27592
INFO:tensorflow:global_step/sec: 2.71606
2021-06-15 12:42:07,364 [INFO] tensorflow: global_step/sec: 2.71606
INFO:tensorflow:global_step/sec: 2.53659
2021-06-15 12:42:08,941 [INFO] tensorflow: global_step/sec: 2.53659
INFO:tensorflow:global_step/sec: 2.4402
2021-06-15 12:42:10,581 [INFO] tensorflow: global_step/sec: 2.4402
INFO:tensorflow:epoch = 0.9787234042553191, loss = 0.060073316, step = 46 (5.586 sec)
2021-06-15 12:42:11,476 [INFO] tensorflow: epoch = 0.9787234042553191, loss = 0.060073316, step = 46 (5.586 sec)
20c71d84dbaa:40:52 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.3<0>
20c71d84dbaa:40:52 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
20c71d84dbaa:40:52 [0] NCCL INFO NET/IB : No device found.
20c71d84dbaa:40:52 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.3<0>
20c71d84dbaa:40:52 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
20c71d84dbaa:40:52 [0] NCCL INFO Channel 00/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 01/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 02/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 03/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 04/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 05/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 06/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 07/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 08/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 09/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 10/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 11/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 12/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 13/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 14/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 15/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 16/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 17/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 18/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 19/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 20/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 21/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 22/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 23/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 24/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 25/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 26/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 27/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 28/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 29/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 30/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Channel 31/32 : 0
20c71d84dbaa:40:52 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
20c71d84dbaa:40:52 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
20c71d84dbaa:40:52 [0] NCCL INFO comm 0x7fe55c37efb0 rank 0 nranks 1 cudaDev 0 busId 80 - Init COMPLETE
2021-06-15 12:42:11,890 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 7, 0.00s/step
Epoch 1/120
=========================
Validation cost: 0.000956
Mean average_precision (in %): 0.0000
class name average precision (in %)
------------ --------------------------
car 0
Median Inference Time: 0.061750
INFO:tensorflow:epoch = 1.0, loss = 0.0009655007, step = 47 (9.809 sec)
2021-06-15 12:42:21,285 [INFO] tensorflow: epoch = 1.0, loss = 0.0009655007, step = 47 (9.809 sec)
2021-06-15 12:42:21,286 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 1/120: loss: 0.00097 Time taken: 0:00:40.904097 ETA: 1:21:07.587496
INFO:tensorflow:global_step/sec: 0.358527
2021-06-15 12:42:21,737 [INFO] tensorflow: global_step/sec: 0.358527
2021-06-15 12:42:22,145 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 5.081
INFO:tensorflow:global_step/sec: 2.5942
2021-06-15 12:42:23,279 [INFO] tensorflow: global_step/sec: 2.5942
INFO:tensorflow:global_step/sec: 2.74829
2021-06-15 12:42:24,735 [INFO] tensorflow: global_step/sec: 2.74829
INFO:tensorflow:global_step/sec: 2.57983
2021-06-15 12:42:26,285 [INFO] tensorflow: global_step/sec: 2.57983
INFO:tensorflow:epoch = 1.297872340425532, loss = 0.0010678559, step = 61 (5.399 sec)
2021-06-15 12:42:26,684 [INFO] tensorflow: epoch = 1.297872340425532, loss = 0.0010678559, step = 61 (5.399 sec)
INFO:tensorflow:global_step/sec: 2.56197
2021-06-15 12:42:27,846 [INFO] tensorflow: global_step/sec: 2.56197
INFO:tensorflow:global_step/sec: 2.22231
2021-06-15 12:42:29,646 [INFO] tensorflow: global_step/sec: 2.22231
INFO:tensorflow:global_step/sec: 2.41353
2021-06-15 12:42:31,304 [INFO] tensorflow: global_step/sec: 2.41353
INFO:tensorflow:epoch = 1.574468085106383, loss = 0.000981944, step = 74 (5.451 sec)
2021-06-15 12:42:32,135 [INFO] tensorflow: epoch = 1.574468085106383, loss = 0.000981944, step = 74 (5.451 sec)
2021-06-15 12:42:32,135 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 10.010
INFO:tensorflow:global_step/sec: 2.5232
2021-06-15 12:42:32,889 [INFO] tensorflow: global_step/sec: 2.5232
INFO:tensorflow:global_step/sec: 2.41112
2021-06-15 12:42:34,548 [INFO] tensorflow: global_step/sec: 2.41112
INFO:tensorflow:global_step/sec: 2.4484
2021-06-15 12:42:36,182 [INFO] tensorflow: global_step/sec: 2.4484
INFO:tensorflow:epoch = 1.872340425531915, loss = 0.0012640116, step = 88 (5.691 sec)
2021-06-15 12:42:37,826 [INFO] tensorflow: epoch = 1.872340425531915, loss = 0.0012640116, step = 88 (5.691 sec)
INFO:tensorflow:global_step/sec: 2.4312
2021-06-15 12:42:37,827 [INFO] tensorflow: global_step/sec: 2.4312
INFO:tensorflow:global_step/sec: 2.53872
2021-06-15 12:42:39,403 [INFO] tensorflow: global_step/sec: 2.53872
2021-06-15 12:42:40,132 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 2/120: loss: 0.00124 Time taken: 0:00:18.815971 ETA: 0:37:00.284538
INFO:tensorflow:global_step/sec: 2.68034
2021-06-15 12:42:40,895 [INFO] tensorflow: global_step/sec: 2.68034
2021-06-15 12:42:42,141 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 9.994
INFO:tensorflow:global_step/sec: 2.51051
2021-06-15 12:42:42,488 [INFO] tensorflow: global_step/sec: 2.51051
INFO:tensorflow:epoch = 2.1702127659574466, loss = 0.0015662777, step = 102 (5.410 sec)
2021-06-15 12:42:43,235 [INFO] tensorflow: epoch = 2.1702127659574466, loss = 0.0015662777, step = 102 (5.410 sec)
INFO:tensorflow:global_step/sec: 2.59284
2021-06-15 12:42:44,031 [INFO] tensorflow: global_step/sec: 2.59284
INFO:tensorflow:global_step/sec: 2.59594
2021-06-15 12:42:45,572 [INFO] tensorflow: global_step/sec: 2.59594
INFO:tensorflow:global_step/sec: 2.30278
2021-06-15 12:42:47,309 [INFO] tensorflow: global_step/sec: 2.30278
INFO:tensorflow:epoch = 2.4680851063829787, loss = 0.000651872, step = 116 (5.758 sec)
2021-06-15 12:42:48,993 [INFO] tensorflow: epoch = 2.4680851063829787, loss = 0.000651872, step = 116 (5.758 sec)
INFO:tensorflow:global_step/sec: 2.37283
2021-06-15 12:42:48,994 [INFO] tensorflow: global_step/sec: 2.37283
INFO:tensorflow:global_step/sec: 2.45762
2021-06-15 12:42:50,622 [INFO] tensorflow: global_step/sec: 2.45762
INFO:tensorflow:global_step/sec: 2.40992
2021-06-15 12:42:52,282 [INFO] tensorflow: global_step/sec: 2.40992
2021-06-15 12:42:52,283 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 9.860
INFO:tensorflow:global_step/sec: 2.53232
2021-06-15 12:42:53,861 [INFO] tensorflow: global_step/sec: 2.53232
INFO:tensorflow:epoch = 2.7659574468085104, loss = 0.00043913763, step = 130 (5.573 sec)
2021-06-15 12:42:54,567 [INFO] tensorflow: epoch = 2.7659574468085104, loss = 0.00043913763, step = 130 (5.573 sec)
INFO:tensorflow:global_step/sec: 2.67132
2021-06-15 12:42:55,359 [INFO] tensorflow: global_step/sec: 2.67132
INFO:tensorflow:global_step/sec: 2.4208
2021-06-15 12:42:57,011 [INFO] tensorflow: global_step/sec: 2.4208
INFO:tensorflow:global_step/sec: 2.75674
2021-06-15 12:42:58,462 [INFO] tensorflow: global_step/sec: 2.75674
2021-06-15 12:42:58,878 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 3/120: loss: 0.00096 Time taken: 0:00:18.719233 ETA: 0:36:30.150265
INFO:tensorflow:epoch = 3.0638297872340425, loss = 0.00080436235, step = 144 (5.759 sec)
2021-06-15 12:43:00,325 [INFO] tensorflow: epoch = 3.0638297872340425, loss = 0.00080436235, step = 144 (5.759 sec)
INFO:tensorflow:global_step/sec: 2.14549
2021-06-15 12:43:00,327 [INFO] tensorflow: global_step/sec: 2.14549
INFO:tensorflow:global_step/sec: 2.64888
2021-06-15 12:43:01,837 [INFO] tensorflow: global_step/sec: 2.64888
2021-06-15 12:43:02,252 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 10.031
INFO:tensorflow:global_step/sec: 2.31762
2021-06-15 12:43:03,563 [INFO] tensorflow: global_step/sec: 2.31762
INFO:tensorflow:global_step/sec: 2.56349
2021-06-15 12:43:05,123 [INFO] tensorflow: global_step/sec: 2.56349
INFO:tensorflow:epoch = 3.361702127659574, loss = 0.00094284175, step = 158 (5.568 sec)
2021-06-15 12:43:05,893 [INFO] tensorflow: epoch = 3.361702127659574, loss = 0.00094284175, step = 158 (5.568 sec)
INFO:tensorflow:global_step/sec: 2.60153
2021-06-15 12:43:06,661 [INFO] tensorflow: global_step/sec: 2.60153
INFO:tensorflow:global_step/sec: 2.4485
2021-06-15 12:43:08,294 [INFO] tensorflow: global_step/sec: 2.4485
INFO:tensorflow:global_step/sec: 2.49224
2021-06-15 12:43:09,899 [INFO] tensorflow: global_step/sec: 2.49224
INFO:tensorflow:epoch = 3.6595744680851063, loss = 0.0015221415, step = 172 (5.468 sec)
2021-06-15 12:43:11,362 [INFO] tensorflow: epoch = 3.6595744680851063, loss = 0.0015221415, step = 172 (5.468 sec)
INFO:tensorflow:global_step/sec: 2.73267
2021-06-15 12:43:11,363 [INFO] tensorflow: global_step/sec: 2.73267
2021-06-15 12:43:12,095 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 10.160
INFO:tensorflow:global_step/sec: 2.60963
2021-06-15 12:43:12,896 [INFO] tensorflow: global_step/sec: 2.60963
INFO:tensorflow:global_step/sec: 2.61323
2021-06-15 12:43:14,426 [INFO] tensorflow: global_step/sec: 2.61323
INFO:tensorflow:global_step/sec: 2.44023
2021-06-15 12:43:16,066 [INFO] tensorflow: global_step/sec: 2.44023
INFO:tensorflow:epoch = 3.957446808510638, loss = 0.0007857586, step = 186 (5.499 sec)
2021-06-15 12:43:16,861 [INFO] tensorflow: epoch = 3.957446808510638, loss = 0.0007857586, step = 186 (5.499 sec)
INFO:tensorflow:global_step/sec: 2.48045
2021-06-15 12:43:17,678 [INFO] tensorflow: global_step/sec: 2.48045
2021-06-15 12:43:17,680 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 4/120: loss: 0.00131 Time taken: 0:00:18.802958 ETA: 0:36:21.143074
INFO:tensorflow:global_step/sec: 2.394
2021-06-15 12:43:19,349 [INFO] tensorflow: global_step/sec: 2.394
INFO:tensorflow:global_step/sec: 2.37207
2021-06-15 12:43:21,035 [INFO] tensorflow: global_step/sec: 2.37207
2021-06-15 12:43:22,289 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 9.810
INFO:tensorflow:epoch = 4.25531914893617, loss = 0.0013742528, step = 200 (5.792 sec)
2021-06-15 12:43:22,653 [INFO] tensorflow: epoch = 4.25531914893617, loss = 0.0013742528, step = 200 (5.792 sec)
INFO:tensorflow:global_step/sec: 2.47044
2021-06-15 12:43:22,654 [INFO] tensorflow: global_step/sec: 2.47044
INFO:tensorflow:global_step/sec: 2.37571
2021-06-15 12:43:24,338 [INFO] tensorflow: global_step/sec: 2.37571
INFO:tensorflow:global_step/sec: 2.25377
2021-06-15 12:43:26,113 [INFO] tensorflow: global_step/sec: 2.25377
INFO:tensorflow:global_step/sec: 2.50197
2021-06-15 12:43:27,712 [INFO] tensorflow: global_step/sec: 2.50197
INFO:tensorflow:epoch = 4.531914893617021, loss = 0.001304254, step = 213 (5.494 sec)
2021-06-15 12:43:28,147 [INFO] tensorflow: epoch = 4.531914893617021, loss = 0.001304254, step = 213 (5.494 sec)
INFO:tensorflow:global_step/sec: 2.47797
2021-06-15 12:43:29,326 [INFO] tensorflow: global_step/sec: 2.47797
INFO:tensorflow:global_step/sec: 2.4851
2021-06-15 12:43:30,936 [INFO] tensorflow: global_step/sec: 2.4851
INFO:tensorflow:global_step/sec: 2.25075
2021-06-15 12:43:32,713 [INFO] tensorflow: global_step/sec: 2.25075
2021-06-15 12:43:32,714 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 9.593
INFO:tensorflow:epoch = 4.829787234042553, loss = 0.0007802945, step = 227 (5.834 sec)
2021-06-15 12:43:33,980 [INFO] tensorflow: epoch = 4.829787234042553, loss = 0.0007802945, step = 227 (5.834 sec)
INFO:tensorflow:global_step/sec: 2.4641
2021-06-15 12:43:34,336 [INFO] tensorflow: global_step/sec: 2.4641
INFO:tensorflow:global_step/sec: 2.64397
2021-06-15 12:43:35,849 [INFO] tensorflow: global_step/sec: 2.64397
2021-06-15 12:43:37,131 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 5/120: loss: 0.00082 Time taken: 0:00:19.484348 ETA: 0:37:20.700027
INFO:tensorflow:global_step/sec: 2.27821
2021-06-15 12:43:37,605 [INFO] tensorflow: global_step/sec: 2.27821
INFO:tensorflow:global_step/sec: 2.55934
2021-06-15 12:43:39,168 [INFO] tensorflow: global_step/sec: 2.55934
INFO:tensorflow:epoch = 5.127659574468085, loss = 0.0010832879, step = 241 (5.597 sec)
2021-06-15 12:43:39,577 [INFO] tensorflow: epoch = 5.127659574468085, loss = 0.0010832879, step = 241 (5.597 sec)
INFO:tensorflow:global_step/sec: 2.34384
2021-06-15 12:43:40,874 [INFO] tensorflow: global_step/sec: 2.34384
INFO:tensorflow:global_step/sec: 2.47716
2021-06-15 12:43:42,489 [INFO] tensorflow: global_step/sec: 2.47716
2021-06-15 12:43:42,930 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 9.788
INFO:tensorflow:global_step/sec: 2.3761
2021-06-15 12:43:44,172 [INFO] tensorflow: global_step/sec: 2.3761
INFO:tensorflow:epoch = 5.425531914893617, loss = 0.001108178, step = 255 (5.735 sec)
2021-06-15 12:43:45,312 [INFO] tensorflow: epoch = 5.425531914893617, loss = 0.001108178, step = 255 (5.735 sec)
INFO:tensorflow:global_step/sec: 2.619
2021-06-15 12:43:45,700 [INFO] tensorflow: global_step/sec: 2.619
INFO:tensorflow:global_step/sec: 2.39603
2021-06-15 12:43:47,369 [INFO] tensorflow: global_step/sec: 2.39603
INFO:tensorflow:global_step/sec: 2.46389
2021-06-15 12:43:48,993 [INFO] tensorflow: global_step/sec: 2.46389
INFO:tensorflow:global_step/sec: 2.70787
2021-06-15 12:43:50,470 [INFO] tensorflow: global_step/sec: 2.70787
INFO:tensorflow:epoch = 5.723404255319148, loss = 0.00061569334, step = 269 (5.557 sec)
2021-06-15 12:43:50,868 [INFO] tensorflow: epoch = 5.723404255319148, loss = 0.00061569334, step = 269 (5.557 sec)
INFO:tensorflow:global_step/sec: 2.56318
2021-06-15 12:43:52,030 [INFO] tensorflow: global_step/sec: 2.56318
2021-06-15 12:43:52,733 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 10.202
INFO:tensorflow:global_step/sec: 2.78304
2021-06-15 12:43:53,468 [INFO] tensorflow: global_step/sec: 2.78304
INFO:tensorflow:global_step/sec: 2.26568
2021-06-15 12:43:55,233 [INFO] tensorflow: global_step/sec: 2.26568
2021-06-15 12:43:55,632 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 7, 0.00s/step
Epoch 6/120
=========================[](https://)
Validation cost: 0.000950
Mean average_precision (in %): 0.0000
class name average precision (in %)
------------ --------------------------
car 0
Median Inference Time: 0.073559
INFO:tensorflow:epoch = 6.0, loss = 0.00092883315, step = 282 (17.548 sec)
2021-06-15 12:44:08,417 [INFO] tensorflow: epoch = 6.0, loss = 0.00092883315, step = 282 (17.548 sec)
2021-06-15 12:44:08,418 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 6/120: loss: 0.00093 Time taken: 0:00:31.336126 ETA: 0:59:32.318347
INFO:tensorflow:global_step/sec: 0.287938
2021-06-15 12:44:09,125 [INFO] tensorflow: global_step/sec: 0.287938
INFO:tensorflow:global_step/sec: 2.52453
2021-06-15 12:44:10,709 [INFO] tensorflow: global_step/sec: 2.52453
INFO:tensorflow:global_step/sec: 2.29179
2021-06-15 12:44:12,455 [INFO] tensorflow: global_step/sec: 2.29179
INFO:tensorflow:epoch = 6.297872340425532, loss = 0.001180372, step = 296 (5.678 sec)
2021-06-15 12:44:14,094 [INFO] tensorflow: epoch = 6.297872340425532, loss = 0.001180372, step = 296 (5.678 sec)
INFO:tensorflow:global_step/sec: 2.43751
2021-06-15 12:44:14,096 [INFO] tensorflow: global_step/sec: 2.43751
2021-06-15 12:44:15,337 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 4.424
INFO:tensorflow:global_step/sec: 2.41797
2021-06-15 12:44:15,750 [INFO] tensorflow: global_step/sec: 2.41797
INFO:tensorflow:global_step/sec: 2.65673
2021-06-15 12:44:17,256 [INFO] tensorflow: global_step/sec: 2.65673
INFO:tensorflow:global_step/sec: 2.4071
2021-06-15 12:44:18,917 [INFO] tensorflow: global_step/sec: 2.4071
INFO:tensorflow:epoch = 6.595744680851063, loss = 0.0012715331, step = 310 (5.551 sec)
2021-06-15 12:44:19,645 [INFO] tensorflow: epoch = 6.595744680851063, loss = 0.0012715331, step = 310 (5.551 sec)
INFO:tensorflow:global_step/sec: 2.57286
2021-06-15 12:44:20,472 [INFO] tensorflow: global_step/sec: 2.57286
INFO:tensorflow:global_step/sec: 2.87132
2021-06-15 12:44:21,865 [INFO] tensorflow: global_step/sec: 2.87132
INFO:tensorflow:global_step/sec: 2.26257