Error loading 'conv1' when training resnet18_ssd?

taras.lishchenko · September 10, 2020, 4:49pm

Hi,

Having troubles with running tlt-train for resnet18_ssd. I’d be glad to hear any suggestions :)
Dataset is already resized to 480x272, converted to tfrecords with the following labels: person, windshield.

Full error message:

The shape of this layer does not match original model: conv1
Loading the model as a pruned model.
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/model_io.py", line 100, in load_model_as_pretrain
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 1047, in set_weights
    str(weights)[:50] + '...')
ValueError: You called `set_weights(weights)` on layer "conv1" with a  weight list of length 2, but the layer was expecting 1 weights. Provided weights: [array([[[[-1.46843329e-01, -3.80116850e-02,  2.28...

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1607, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1024 and 512. Shapes are [1024,176] and [512,176]. for 'Assign_557' (op: 'Assign') with input shapes: [1024,176], [512,176].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 45, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py", line 248, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/train.py", line 100, in run_experiment
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/model_io.py", line 117, in load_model_as_pretrain
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/model_io.py", line 51, in load_model
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/utils/model_io.py", line 30, in get_model_with_input
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 287, in _deserialize_model
    K.batch_set_value(weight_value_tuples)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2465, in batch_set_value
    assign_op = x.assign(assign_placeholder)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 2067, in assign
    self._variable, value, use_locking=use_locking, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/state_ops.py", line 227, in assign
    validate_shape=validate_shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_state_ops.py", line 66, in assign
    use_locking=use_locking, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1770, in __init__
    control_input_ops)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1610, in _create_c_op
    raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 1024 and 512. Shapes are [1024,176] and [512,176]. for 'Assign_557' (op: 'Assign') with input shapes: [1024,176], [512,176].

Train command:

tlt-train ssd -e specs.txt -r out/ -k $KEY -m resnet18.hdf5 --gpus 1

TLT-container: v2.0_py3
Train Specs:

random_seed: 42
ssd_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5, 3.0, 1.0/3.0]"
  scales: "[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]"
  two_boxes_for_ar1: true
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  freeze_bn: false
  freeze_blocks: 0
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 100
  enable_qat: false
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 5e-5
    max_learning_rate: 2e-2
    soft_start: 0.15
    annealing: 0.8
    }
  }
  regularizer {
    type: L1
    weight: 3e-5
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 4
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
  preprocessing {
    output_image_width: 480
    output_image_height: 272
    output_image_channel: 3
    crop_right: 480
    crop_bottom: 272
    min_bbox_width: 1.0
    min_bbox_height: 1.0
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 0.7
    zoom_max: 1.8
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tlt-experiments/tlt-traffic/data/tfrecords/kitti_trainval/*"
    image_directory_path: "/workspace/tlt-experiments/datasets/train"
  }
  image_extension: "jpg"
  target_class_mapping {
      key: "windshield"
      value: "windshield"
  }
  target_class_mapping {
      key: "person"
      value: "person"
  }
validation_fold: 0
}

Thank you)

Morganh · September 11, 2020, 5:54am

Please resize your images/labels to a new resolution.
Your current resolution is 480x272, but 272 is not multiples of 32.

Below is the requirement from tlt user guide.

SSD

Input size: C * W * H (where C = 1 or 3, W >= 128, H >= 128, W, H are multiples of 32)

Image format: JPG, JPEG, PNG

Label format: KITTI detection

taras.lishchenko · September 13, 2020, 6:05pm

Hi,

Thank you ;)

The problem was that I took pre-trained model from detectnet_v2… Now I’ve loaded from object_detection and everything works. Of course with valid image size)

Topic		Replies	Views
The shape of this layer does not match original model: conv1 Loading the model as a pruned model TAO Toolkit ssd	5	986	October 12, 2021
Detectnet_v2 tlt ( training to detect person) TAO Toolkit	12	704	October 12, 2021
Training Custom FasterRCNN resnet50 Object detection issue TAO Toolkit	9	1122	October 12, 2021
Train with my own tlt model TAO Toolkit	14	713	December 13, 2021
Tlt 3.0 retrained vehicletypenet, classification net error when loaded pretrained model TAO Toolkit	4	403	October 12, 2021
TLT Detectnet TrafficCamNet training not working TAO Toolkit	10	2485	October 12, 2021
Error wile using TLT pretrained model tlt_semantic_segmentation:resnet101 TAO Toolkit	7	591	August 27, 2021
Error on tlt-training detectnet_v2? TAO Toolkit	6	474	October 12, 2021
GRAYSCALE as image_type not working with tlt-train faster_rcnn TAO Toolkit	13	676	October 12, 2021
Error while training with higher resolution images in yolo_v4 TLT-V3 TAO Toolkit	7	534	October 12, 2021

Error loading 'conv1' when training resnet18_ssd?

Related topics