Hello.
I’m trying semantic segmentation with tlt v3 on custom dataset.
I use a resnet18 backbone, however when launching training i got shape error.
To quote the documentation (Semantic Segmentation — Transfer Learning Toolkit 3.0 documentation) :
The size of the images need not necessarily be equal to the model input dimensions. The images are resized internally to model input dimensions
But the error I got is about shape mismatch ValueError:
generator yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected
(see full log below).
My command for training :
tlt unet train --gpus=1 \
-e /workspace/tlt-experiments/specs/resnet18.txt \
-r /output/runs/resnet18_run1 \
-m /output/pretrained_resnet18/tlt_semantic_segmentation_vresnet18/resnet_18.hdf5 \
-n resnet18_lip \
-k $KEY
Full traceback + logs:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-cqcmse4k because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/checkpoint_saver_hook.py:21: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/pretrained_restore_hook.py:23: The name tf.logging.WARN is deprecated. Please use tf.compat.v1.logging.WARN instead.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py:389: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
Loading experiment spec at /workspace/tlt-experiments/specs/resnet18.txt.
2021-04-23 13:55:05,679 [INFO] __main__: Loading experiment spec at /workspace/tlt-experiments/specs/resnet18.txt.
2021-04-23 13:55:05,681 [INFO] iva.unet.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/specs/resnet18.txt
2021-04-23 13:55:05,690 [INFO] root: Initializing the pre-trained weights from /output/pretrained_resnet18/tlt_semantic_segmentation_vresnet18/resnet_18.hdf5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
2021-04-23 13:55:05,696 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2021-04-23 13:55:05,705 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
2021-04-23 13:55:05,726 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
2021-04-23 13:55:05,731 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2021-04-23 13:55:06,470 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
2021-04-23 13:55:06,761 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
2021-04-23 13:55:06,761 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2021-04-23 13:55:06,920 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
2021-04-23 13:55:07,386 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.
INFO:tensorflow:Using config: {'_model_dir': '/output/runs/resnet18_run1', '_tf_random_seed': None, '_save_summary_steps': 5, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1
inter_op_parallelism_threads: 38
gpu_options {
allow_growth: true
visible_device_list: "0"
force_gpu_compatible: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9c90e25208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2021-04-23 13:55:07,409 [INFO] tensorflow: Using config: {'_model_dir': '/output/runs/resnet18_run1', '_tf_random_seed': None, '_save_summary_steps': 5, '_save_checkpoints_steps': None, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 1
inter_op_parallelism_threads: 38
gpu_options {
allow_growth: true
visible_device_list: "0"
force_gpu_compatible: true
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': None, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9c90e25208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
2021-04-23 13:55:07,502 [INFO] iva.unet.model.utilities: The total number of training samples 30462 and the batch size per GPU 64
2021-04-23 13:55:07,502 [INFO] iva.unet.model.utilities: Cannot iterate over exactly 30462 samples with a batch size of 64; each epoch will therefore take one extra step.
2021-04-23 13:55:07,502 [INFO] iva.unet.model.utilities: Steps per epoch taken: 476
Running for 1 Epochs
2021-04-23 13:55:07,502 [INFO] __main__: Running for 1 Epochs
INFO:tensorflow:Create CheckpointSaverHook.
2021-04-23 13:55:07,502 [INFO] tensorflow: Create CheckpointSaverHook.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2021-04-23 13:55:08,384 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
WARNING:tensorflow:Entity <bound method Dataset._preproc_samples of <iva.unet.utils.data_loader.Dataset object at 0x7f9c90e25160>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset._preproc_samples of <iva.unet.utils.data_loader.Dataset object at 0x7f9c90e25160>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2021-04-23 13:55:08,420 [WARNING] tensorflow: Entity <bound method Dataset._preproc_samples of <iva.unet.utils.data_loader.Dataset object at 0x7f9c90e25160>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of <bound method Dataset._preproc_samples of <iva.unet.utils.data_loader.Dataset object at 0x7f9c90e25160>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:266: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2021-04-23 13:55:08,422 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:266: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
INFO:tensorflow:Calling model_fn.
2021-04-23 13:55:08,448 [INFO] tensorflow: Calling model_fn.
{'exec_mode': 'train', 'model_dir': '/output/runs/resnet18_run1', 'log_dir': None, 'batch_size': 64, 'learning_rate': 9.999999747378752e-05, 'crossvalidation_idx': None, 'max_steps': None, 'weight_decay': 3.000000026176508e-09, 'log_summary_steps': 10, 'warmup_steps': 0, 'augment': False, 'use_amp': False, 'use_trt': False, 'use_xla': False, 'loss': 'cross_entropy', 'epochs': 1, 'pretrained_weights_file': None, 'unet_model': <iva.unet.model.unet_model.UnetModel object at 0x7f9acd08d668>, 'key': 'bTRybTg2YXJ0ZmludnU5Yzc1Y2dqcXVldDE6YTA4NzdlNzAtYWFjNS00MDk4LWJlNDctZjMwODZmNGIxY2Ew', 'experiment_spec': random_seed: 42
dataset_config {
dataset: "custom"
input_image_type: "color"
train_images_path: "/data1/TrainVal_images/TrainVal_images/train_images/"
train_masks_path: "/data1/TrainVal_parsing_annotations/TrainVal_parsing_annotations/train_segmentations/"
val_images_path: "/data1/TrainVal_images/TrainVal_images/val_images/"
val_masks_path: "/data1/TrainVal_parsing_annotations/TrainVal_parsing_annotations/val_segmentations/"
data_class_config {
target_classes {
name: "Background"
mapping_class: "Background"
}
target_classes {
name: "Hat"
label_id: 1
mapping_class: "Hat"
}
target_classes {
name: "Hair"
label_id: 2
mapping_class: "Hair"
}
target_classes {
name: "Glove"
label_id: 3
mapping_class: "Glove"
}
target_classes {
name: "Sunglasses"
label_id: 4
mapping_class: "Sunglasses"
}
target_classes {
name: "UpperClothes"
label_id: 5
mapping_class: "UpperClothes"
}
target_classes {
name: "Dress"
label_id: 6
mapping_class: "Dress"
}
target_classes {
name: "Coat"
label_id: 7
mapping_class: "Coat"
}
target_classes {
name: "Socks"
label_id: 8
mapping_class: "Socks"
}
target_classes {
name: "Pants"
label_id: 9
mapping_class: "Pants"
}
target_classes {
name: "Jumpsuits"
label_id: 10
mapping_class: "Jumpsuits"
}
target_classes {
name: "Scarf"
label_id: 11
mapping_class: "Scarf"
}
target_classes {
name: "Skirt"
label_id: 12
mapping_class: "Skirt"
}
target_classes {
name: "Face"
label_id: 13
mapping_class: "Face"
}
target_classes {
name: "Left-arm"
label_id: 14
mapping_class: "Left-arm"
}
target_classes {
name: "Right-arm"
label_id: 15
mapping_class: "Right-arm"
}
target_classes {
name: "Left-leg"
label_id: 16
mapping_class: "Left-leg"
}
target_classes {
name: "Right-leg"
label_id: 17
mapping_class: "Right-leg"
}
target_classes {
name: "Left-shoe"
label_id: 18
mapping_class: "Left-shoe"
}
target_classes {
name: "Right-shoe"
label_id: 19
mapping_class: "Right-shoe"
}
}
}
model_config {
num_layers: 18
use_batch_norm: true
training_precision {
backend_floatx: FLOAT32
}
arch: "resnet"
all_projections: true
model_input_height: 320
model_input_width: 320
model_input_channels: 3
}
training_config {
batch_size: 64
regularizer {
type: L2
weight: 3.000000026176508e-09
}
optimizer {
adam {
epsilon: 9.99999993922529e-09
beta1: 0.8999999761581421
beta2: 0.9990000128746033
}
}
checkpoint_interval: 1
log_summary_steps: 10
learning_rate: 9.999999747378752e-05
loss: "cross_entropy"
epochs: 1
}
, 'seed': 42, 'benchmark': False, 'temp_dir': '/tmp/tmpc0a6e1w0', 'num_classes': 20, 'start_step': 0, 'checkpoint_interval': 1, 'phase': None}
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 320, 320) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 160, 160) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 160, 160) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 160, 160) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 80, 80) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 80, 80) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation) (None, 64, 80, 80) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 80, 80) 36928 block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 80, 80) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 80, 80) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 80, 80) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 80, 80) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation) (None, 64, 80, 80) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 80, 80) 36928 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 80, 80) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation) (None, 64, 80, 80) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 80, 80) 36928 block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 80, 80) 4160 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 80, 80) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 80, 80) 256 block_1b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 80, 80) 0 block_1b_bn_2[0][0]
block_1b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation) (None, 64, 80, 80) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 40, 40) 73856 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 40, 40) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation) (None, 128, 40, 40) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 40, 40) 147584 block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 40, 40) 8320 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 40, 40) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 40, 40) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 40, 40) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation) (None, 128, 40, 40) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 40, 40) 147584 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 40, 40) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation) (None, 128, 40, 40) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 40, 40) 147584 block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 40, 40) 16512 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 40, 40) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 40, 40) 512 block_2b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 40, 40) 0 block_2b_bn_2[0][0]
block_2b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation) (None, 128, 40, 40) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 20, 20) 295168 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 20, 20) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation) (None, 256, 20, 20) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 20, 20) 590080 block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 20, 20) 33024 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 20, 20) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 20, 20) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 20, 20) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation) (None, 256, 20, 20) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 20, 20) 590080 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 20, 20) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation) (None, 256, 20, 20) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 20, 20) 590080 block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 20, 20) 65792 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 20, 20) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 20, 20) 1024 block_3b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 20, 20) 0 block_3b_bn_2[0][0]
block_3b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation) (None, 256, 20, 20) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 20, 20) 1180160 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 20, 20) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation) (None, 512, 20, 20) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 20, 20) 2359808 block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 20, 20) 131584 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 20, 20) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 20, 20) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 20, 20) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation) (None, 512, 20, 20) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 20, 20) 2359808 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 20, 20) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation) (None, 512, 20, 20) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 20, 20) 2359808 block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 20, 20) 262656 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 20, 20) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 20, 20) 2048 block_4b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 20, 20) 0 block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation) (None, 512, 20, 20) 0 add_8[0][0]
__________________________________________________________________________________________________
conv2d_transpose_1 (Conv2DTrans (None, 256, 40, 40) 2097408 block_4b_relu[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 384, 40, 40) 0 conv2d_transpose_1[0][0]
block_2a_relu[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 256, 40, 40) 884992 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_transpose_2 (Conv2DTrans (None, 128, 80, 80) 524416 conv2d_1[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 192, 80, 80) 0 conv2d_transpose_2[0][0]
block_1a_relu[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 80, 80) 221312 concatenate_2[0][0]
__________________________________________________________________________________________________
conv2d_transpose_3 (Conv2DTrans (None, 64, 160, 160) 131136 conv2d_2[0][0]
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 128, 160, 160 0 conv2d_transpose_3[0][0]
bn_conv1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 64, 160, 160) 73792 concatenate_3[0][0]
__________________________________________________________________________________________________
conv2d_transpose_4 (Conv2DTrans (None, 64, 320, 320) 65600 conv2d_3[0][0]
__________________________________________________________________________________________________
concatenate_4 (Concatenate) (None, 67, 320, 320) 0 conv2d_transpose_4[0][0]
input_1[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 64, 320, 320) 38656 concatenate_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 20, 320, 320) 11540 conv2d_4[0][0]
==================================================================================================
Total params: 15,597,140
Trainable params: 15,585,492
Non-trainable params: 11,648
__________________________________________________________________________________________________
INFO:tensorflow:Done calling model_fn.
2021-04-23 13:55:12,956 [INFO] tensorflow: Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2021-04-23 13:55:15,231 [INFO] tensorflow: Graph was finalized.
INFO:tensorflow:Running local_init_op.
2021-04-23 13:55:16,562 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-04-23 13:55:16,690 [INFO] tensorflow: Done running local_init_op.
[GPU] Restoring pretrained weights from: /tmp/tmppr_q3_qt/model.ckpt-1
2021-04-23 13:55:17,564 [INFO] iva.unet.hooks.pretrained_restore_hook: Pretrained weights loaded with success...
INFO:tensorflow:Saving checkpoints for step-0.
2021-04-23 13:55:21,874 [INFO] tensorflow: Saving checkpoints for step-0.
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/training_hook.py:92: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
2021-04-23 13:55:27,027 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/hooks/training_hook.py:92: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 235, in __call__
ret = func(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 674, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))
ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
[[{{node PyFunc}}]]
[[IteratorGetNext]]
(1) Invalid argument: ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 235, in __call__
ret = func(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 674, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))
ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
[[{{node PyFunc}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_5081]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 403, in <module>
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 397, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 298, in run_experiment
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 217, in train_unet
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/train.py", line 104, in run_training_loop
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 235, in __call__
ret = func(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 674, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))
ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
[[{{node PyFunc}}]]
[[IteratorGetNext]]
(1) Invalid argument: ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/script_ops.py", line 235, in __call__
ret = func(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 674, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))
ValueError: `generator` yielded an element of shape (185, 189, 3) where an element of shape (380, 356, 3) was expected.
[[{{node PyFunc}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_5081]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
File "/usr/local/bin/unet", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/entrypoint/unet.py", line 12, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
2021-04-23 15:55:32,027 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
PS: I tried the same with vgg16 backbone but got same error