Unable to train | tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC

Hi there,
I’m trying to train resnet10 using ssd using tlt but everytime getting the following assertion error, can you please help me out? Thanks.
Details :

ssd_train_resnet10_kitti.txt :

random_seed: 42
ssd_config {
aspect_ratios_global: “[1.0, 2.0, 0.5, 3.0, 1.0/3.0]”
scales: “[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]”
two_boxes_for_ar1: true
clip_boxes: false
variances: “[0.1, 0.1, 0.2, 0.2]”
arch: “resnet”
nlayers: 10
freeze_bn: false
freeze_blocks: 0
}
training_config {
batch_size_per_gpu: 16
num_epochs: 80
enable_qat: false
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 5e-5
max_learning_rate: 2e-2
soft_start: 0.15
annealing: 0.8
}
}
regularizer {
type: L1
weight: 3e-5
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
output_width: 300
output_height: 300
output_channel: 3
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/training/label_2”
image_directory_path: “/workspace/tlt-experiments/data/training/image_2”
}
include_difficult_in_training: true
target_class_mapping {
key: “car”
value: “car”
}
target_class_mapping {
key: “pedestrian”
value: “pedestrian”
}
target_class_mapping {
key: “cyclist”
value: “cyclist”
}
target_class_mapping {
key: “van”
value: “car”
}
target_class_mapping {
key: “person_sitting”
value: “pedestrian”
}
validation_data_sources: {
label_directory_path: “/workspace/tlt-experiments/data/val/label”
image_directory_path: “/workspace/tlt-experiments/data/val/image”
}
}

Command :

!echo tlt ssd train --gpus 1 --gpu_index=$GPU_INDEX
-e $SPECS_DIR/ssd_train_resnet10_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-m $USER_EXPERIMENT_DIR/pretrained_resnet10/tlt_pretrained_object_detection_vresnet10/resnet_10.hdf5
Error :

To run with multigpu, please change --gpus based on the number of available GPUs in your machine.
2021-06-03 05:31:16,622 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-06-03 05:31:24,565 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-06-03 05:31:24,565 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py:63: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-06-03 05:31:24,657 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py:63: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py:66: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-06-03 05:31:24,657 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py:66: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-06-03 05:31:24,795 [INFO] /usr/local/lib/python3.6/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/ssd/specs/ssd_train_resnet10_kitti.txt
2021-06-03 05:31:24,811 [INFO] main: Loading pretrained weights. This may take a while…
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2021-06-03 05:31:24,812 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2021-06-03 05:31:24,813 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2021-06-03 05:31:24,836 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

2021-06-03 05:31:25,312 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4185: The name tf.truncated_normal is deprecated. Please use tf.random.truncated_normal instead.

WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2021-06-03 05:31:27,048 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2021-06-03 05:31:27,194 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2021-06-03 05:31:27,194 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2021-06-03 05:31:27,609 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2021-06-03 05:31:28,158 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

2021-06-03 05:31:28,164 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2021-06-03 05:31:28,640 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2021-06-03 05:31:28,765 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

Initialize optimizer
WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:121: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

2021-06-03 05:31:37,531 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:121: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:122: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

2021-06-03 05:31:37,531 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:122: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:123: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

2021-06-03 05:31:37,532 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/tensor_utils.py:123: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.


Layer (type) Output Shape Param # Connected to

Input (InputLayer) (None, 3, 300, 300) 0


conv1 (Conv2D) (None, 64, 150, 150) 9408 Input[0][0]


bn_conv1 (BatchNormalization) (None, 64, 150, 150) 256 conv1[0][0]


activation_1 (Activation) (None, 64, 150, 150) 0 bn_conv1[0][0]


block_1a_conv_1 (Conv2D) (None, 64, 75, 75) 36864 activation_1[0][0]


block_1a_bn_1 (BatchNormalizati (None, 64, 75, 75) 256 block_1a_conv_1[0][0]


block_1a_relu_1 (Activation) (None, 64, 75, 75) 0 block_1a_bn_1[0][0]


block_1a_conv_2 (Conv2D) (None, 64, 75, 75) 36864 block_1a_relu_1[0][0]


block_1a_conv_shortcut (Conv2D) (None, 64, 75, 75) 4096 activation_1[0][0]


block_1a_bn_2 (BatchNormalizati (None, 64, 75, 75) 256 block_1a_conv_2[0][0]


block_1a_bn_shortcut (BatchNorm (None, 64, 75, 75) 256 block_1a_conv_shortcut[0][0]


add_1 (Add) (None, 64, 75, 75) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]


block_1a_relu (Activation) (None, 64, 75, 75) 0 add_1[0][0]


block_2a_conv_1 (Conv2D) (None, 128, 38, 38) 73728 block_1a_relu[0][0]


block_2a_bn_1 (BatchNormalizati (None, 128, 38, 38) 512 block_2a_conv_1[0][0]


block_2a_relu_1 (Activation) (None, 128, 38, 38) 0 block_2a_bn_1[0][0]


block_2a_conv_2 (Conv2D) (None, 128, 38, 38) 147456 block_2a_relu_1[0][0]


block_2a_conv_shortcut (Conv2D) (None, 128, 38, 38) 8192 block_1a_relu[0][0]


block_2a_bn_2 (BatchNormalizati (None, 128, 38, 38) 512 block_2a_conv_2[0][0]


block_2a_bn_shortcut (BatchNorm (None, 128, 38, 38) 512 block_2a_conv_shortcut[0][0]


add_2 (Add) (None, 128, 38, 38) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]


block_2a_relu (Activation) (None, 128, 38, 38) 0 add_2[0][0]


block_3a_conv_1 (Conv2D) (None, 256, 19, 19) 294912 block_2a_relu[0][0]


block_3a_bn_1 (BatchNormalizati (None, 256, 19, 19) 1024 block_3a_conv_1[0][0]


block_3a_relu_1 (Activation) (None, 256, 19, 19) 0 block_3a_bn_1[0][0]


block_3a_conv_2 (Conv2D) (None, 256, 19, 19) 589824 block_3a_relu_1[0][0]


block_3a_conv_shortcut (Conv2D) (None, 256, 19, 19) 32768 block_2a_relu[0][0]


block_3a_bn_2 (BatchNormalizati (None, 256, 19, 19) 1024 block_3a_conv_2[0][0]


block_3a_bn_shortcut (BatchNorm (None, 256, 19, 19) 1024 block_3a_conv_shortcut[0][0]


add_3 (Add) (None, 256, 19, 19) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]


block_3a_relu (Activation) (None, 256, 19, 19) 0 add_3[0][0]


block_4a_conv_1 (Conv2D) (None, 512, 19, 19) 1179648 block_3a_relu[0][0]


block_4a_bn_1 (BatchNormalizati (None, 512, 19, 19) 2048 block_4a_conv_1[0][0]


block_4a_relu_1 (Activation) (None, 512, 19, 19) 0 block_4a_bn_1[0][0]


block_4a_conv_2 (Conv2D) (None, 512, 19, 19) 2359296 block_4a_relu_1[0][0]


block_4a_conv_shortcut (Conv2D) (None, 512, 19, 19) 131072 block_3a_relu[0][0]


block_4a_bn_2 (BatchNormalizati (None, 512, 19, 19) 2048 block_4a_conv_2[0][0]


block_4a_bn_shortcut (BatchNorm (None, 512, 19, 19) 2048 block_4a_conv_shortcut[0][0]


add_4 (Add) (None, 512, 19, 19) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]


block_4a_relu (Activation) (None, 512, 19, 19) 0 add_4[0][0]


ssd_expand_block_1_conv_0 (Conv (None, 64, 19, 19) 32832 block_4a_relu[0][0]


ssd_expand_block_1_relu_0 (ReLU (None, 64, 19, 19) 0 ssd_expand_block_1_conv_0[0][0]


ssd_expand_block_1_conv_1 (Conv (None, 128, 10, 10) 73728 ssd_expand_block_1_relu_0[0][0]


ssd_expand_block_1_bn_1 (BatchN (None, 128, 10, 10) 512 ssd_expand_block_1_conv_1[0][0]


ssd_expand_block_1_relu_1 (ReLU (None, 128, 10, 10) 0 ssd_expand_block_1_bn_1[0][0]


ssd_expand_block_2_conv_0 (Conv (None, 64, 10, 10) 8256 ssd_expand_block_1_relu_1[0][0]


ssd_expand_block_2_relu_0 (ReLU (None, 64, 10, 10) 0 ssd_expand_block_2_conv_0[0][0]


ssd_expand_block_2_conv_1 (Conv (None, 128, 5, 5) 73728 ssd_expand_block_2_relu_0[0][0]


ssd_expand_block_2_bn_1 (BatchN (None, 128, 5, 5) 512 ssd_expand_block_2_conv_1[0][0]


ssd_expand_block_2_relu_1 (ReLU (None, 128, 5, 5) 0 ssd_expand_block_2_bn_1[0][0]


ssd_expand_block_3_conv_0 (Conv (None, 64, 5, 5) 8256 ssd_expand_block_2_relu_1[0][0]


ssd_expand_block_3_relu_0 (ReLU (None, 64, 5, 5) 0 ssd_expand_block_3_conv_0[0][0]


ssd_expand_block_3_conv_1 (Conv (None, 128, 3, 3) 73728 ssd_expand_block_3_relu_0[0][0]


ssd_expand_block_3_bn_1 (BatchN (None, 128, 3, 3) 512 ssd_expand_block_3_conv_1[0][0]


ssd_expand_block_3_relu_1 (ReLU (None, 128, 3, 3) 0 ssd_expand_block_3_bn_1[0][0]


ssd_expand_block_4_conv_0 (Conv (None, 64, 3, 3) 8256 ssd_expand_block_3_relu_1[0][0]


ssd_expand_block_4_relu_0 (ReLU (None, 64, 3, 3) 0 ssd_expand_block_4_conv_0[0][0]


ssd_expand_block_4_conv_1 (Conv (None, 128, 2, 2) 73728 ssd_expand_block_4_relu_0[0][0]


ssd_expand_block_4_bn_1 (BatchN (None, 128, 2, 2) 512 ssd_expand_block_4_conv_1[0][0]


ssd_expand_block_4_relu_1 (ReLU (None, 128, 2, 2) 0 ssd_expand_block_4_bn_1[0][0]


ssd_conf_0 (Conv2D) (None, 24, 38, 38) 27672 block_2a_relu[0][0]


ssd_conf_1 (Conv2D) (None, 24, 19, 19) 110616 block_4a_relu[0][0]


ssd_conf_2 (Conv2D) (None, 24, 10, 10) 27672 ssd_expand_block_1_relu_1[0][0]


ssd_conf_3 (Conv2D) (None, 24, 5, 5) 27672 ssd_expand_block_2_relu_1[0][0]


ssd_conf_4 (Conv2D) (None, 24, 3, 3) 27672 ssd_expand_block_3_relu_1[0][0]


ssd_conf_5 (Conv2D) (None, 24, 2, 2) 27672 ssd_expand_block_4_relu_1[0][0]


permute_1 (Permute) (None, 38, 38, 24) 0 ssd_conf_0[0][0]


permute_2 (Permute) (None, 19, 19, 24) 0 ssd_conf_1[0][0]


permute_3 (Permute) (None, 10, 10, 24) 0 ssd_conf_2[0][0]


permute_4 (Permute) (None, 5, 5, 24) 0 ssd_conf_3[0][0]


permute_5 (Permute) (None, 3, 3, 24) 0 ssd_conf_4[0][0]


permute_6 (Permute) (None, 2, 2, 24) 0 ssd_conf_5[0][0]


conf_reshape_0 (Reshape) (None, 8664, 1, 4) 0 permute_1[0][0]


conf_reshape_1 (Reshape) (None, 2166, 1, 4) 0 permute_2[0][0]


conf_reshape_2 (Reshape) (None, 600, 1, 4) 0 permute_3[0][0]


conf_reshape_3 (Reshape) (None, 150, 1, 4) 0 permute_4[0][0]


conf_reshape_4 (Reshape) (None, 54, 1, 4) 0 permute_5[0][0]


conf_reshape_5 (Reshape) (None, 24, 1, 4) 0 permute_6[0][0]


mbox_conf (Concatenate) (None, 11658, 1, 4) 0 conf_reshape_0[0][0]
conf_reshape_1[0][0]
conf_reshape_2[0][0]
conf_reshape_3[0][0]
conf_reshape_4[0][0]
conf_reshape_5[0][0]


ssd_loc_0 (Conv2D) (None, 24, 38, 38) 27672 block_2a_relu[0][0]


ssd_loc_1 (Conv2D) (None, 24, 19, 19) 110616 block_4a_relu[0][0]


ssd_loc_2 (Conv2D) (None, 24, 10, 10) 27672 ssd_expand_block_1_relu_1[0][0]


ssd_loc_3 (Conv2D) (None, 24, 5, 5) 27672 ssd_expand_block_2_relu_1[0][0]


ssd_loc_4 (Conv2D) (None, 24, 3, 3) 27672 ssd_expand_block_3_relu_1[0][0]


ssd_loc_5 (Conv2D) (None, 24, 2, 2) 27672 ssd_expand_block_4_relu_1[0][0]


before_softmax_permute (Permute (None, 4, 1, 11658) 0 mbox_conf[0][0]


permute_7 (Permute) (None, 38, 38, 24) 0 ssd_loc_0[0][0]


permute_8 (Permute) (None, 19, 19, 24) 0 ssd_loc_1[0][0]


permute_9 (Permute) (None, 10, 10, 24) 0 ssd_loc_2[0][0]


permute_10 (Permute) (None, 5, 5, 24) 0 ssd_loc_3[0][0]


permute_11 (Permute) (None, 3, 3, 24) 0 ssd_loc_4[0][0]


permute_12 (Permute) (None, 2, 2, 24) 0 ssd_loc_5[0][0]


ssd_anchor_0 (AnchorBoxes) (None, 1444, 6, 8) 0 ssd_loc_0[0][0]


ssd_anchor_1 (AnchorBoxes) (None, 361, 6, 8) 0 ssd_loc_1[0][0]


ssd_anchor_2 (AnchorBoxes) (None, 100, 6, 8) 0 ssd_loc_2[0][0]


ssd_anchor_3 (AnchorBoxes) (None, 25, 6, 8) 0 ssd_loc_3[0][0]


ssd_anchor_4 (AnchorBoxes) (None, 9, 6, 8) 0 ssd_loc_4[0][0]


ssd_anchor_5 (AnchorBoxes) (None, 4, 6, 8) 0 ssd_loc_5[0][0]


mbox_conf_softmax_ (Softmax) (None, 4, 1, 11658) 0 before_softmax_permute[0][0]


loc_reshape_0 (Reshape) (None, 8664, 1, 4) 0 permute_7[0][0]


loc_reshape_1 (Reshape) (None, 2166, 1, 4) 0 permute_8[0][0]


loc_reshape_2 (Reshape) (None, 600, 1, 4) 0 permute_9[0][0]


loc_reshape_3 (Reshape) (None, 150, 1, 4) 0 permute_10[0][0]


loc_reshape_4 (Reshape) (None, 54, 1, 4) 0 permute_11[0][0]


loc_reshape_5 (Reshape) (None, 24, 1, 4) 0 permute_12[0][0]


anchor_reshape_0 (Reshape) (None, 8664, 1, 8) 0 ssd_anchor_0[0][0]


anchor_reshape_1 (Reshape) (None, 2166, 1, 8) 0 ssd_anchor_1[0][0]


anchor_reshape_2 (Reshape) (None, 600, 1, 8) 0 ssd_anchor_2[0][0]


anchor_reshape_3 (Reshape) (None, 150, 1, 8) 0 ssd_anchor_3[0][0]


anchor_reshape_4 (Reshape) (None, 54, 1, 8) 0 ssd_anchor_4[0][0]


anchor_reshape_5 (Reshape) (None, 24, 1, 8) 0 ssd_anchor_5[0][0]


mbox_conf_softmax (Permute) (None, 11658, 1, 4) 0 mbox_conf_softmax_[0][0]


mbox_loc (Concatenate) (None, 11658, 1, 4) 0 loc_reshape_0[0][0]
loc_reshape_1[0][0]
loc_reshape_2[0][0]
loc_reshape_3[0][0]
loc_reshape_4[0][0]
loc_reshape_5[0][0]


mbox_priorbox (Concatenate) (None, 11658, 1, 8) 0 anchor_reshape_0[0][0]
anchor_reshape_1[0][0]
anchor_reshape_2[0][0]
anchor_reshape_3[0][0]
anchor_reshape_4[0][0]
anchor_reshape_5[0][0]


concatenate_1 (Concatenate) (None, 11658, 1, 16) 0 mbox_conf_softmax[0][0]
mbox_loc[0][0]
mbox_priorbox[0][0]


ssd_predictions (Reshape) (None, 11658, 16) 0 concatenate_1[0][0]

Total params: 5,768,416
Trainable params: 5,752,096
Non-trainable params: 16,320


2021-06-03 05:31:37,702 [INFO] main: Number of images in the training dataset: 6733
2021-06-03 05:31:37,702 [INFO] main: Number of images in the validation dataset: 748
Epoch 1/80
Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py”, line 313, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py”, line 309, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py”, line 237, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”, line 1418, in fit_generator
initial_epoch=initial_epoch)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py”, line 217, in fit_generator
class_weight=class_weight)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”, line 1217, in train_on_batch
outputs = self.train_function(ins)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2715, in call
return self._call(inputs)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2671, in _call
session)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2623, in _make_callable
callable_fn = session._make_callable_from_options(callable_opts)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1505, in _make_callable_from_options
return BaseSession._Callable(self, callable_options)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1460, in init
session._session, options_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[{{node training_1/SGD/gradients/ssd_loc_0/convolution_grad/Conv2DBackpropInput}}]]
Traceback (most recent call last):
File “/usr/local/bin/ssd”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/entrypoint/ssd.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.
2021-06-03 05:31:46,060 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Correction:
Commad :
tlt ssd train --gpus 1 --gpu_index=$GPU_INDEX
-e $SPECS_DIR/ssd_train_resnet10_kitti.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
-m $USER_EXPERIMENT_DIR/pretrained_resnet10/tlt_pretrained_object_detection_vresnet10/resnet_10.hdf5

Which gpu did you use?

Tesla K80

The gpu is a bit old. Please refer to Training on Custom Dataset using TLT

Thank you very much. Let me try out a different GPU.

I am facing the same issue…sometimes the tensorflow gets stuck. How can it be resolved if we cannot get new GPUs?

Please check if the setting meets requirement.
For TLT3.0, See Transfer Learning Toolkit — Transfer Learning Toolkit 3.0 documentation
For TLT 2.0, see Requirements and Installation — Transfer Learning Toolkit 2.0 documentation