Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]

• Hardware (Training on system with NVIDIA TITAN RTX(24G), Precision 7920 Tower with 32G memory)
• Network Type (Mask_rcnn using mapillary-vistas-dataset)

I am training maskrcnn with resnet34 architecture.
I have error as
Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]
What could be issue?
Attached is the config file for training.
maskrcnn_train_resnet34.txt (2.0 KB)
The whole error is

INFO:tensorflow:Done calling model_fn.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 222, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 218, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 85, in run_executer
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py", line 399, in train_and_eval
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1490, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 713, in __init__
    h.begin()
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/hooks/pretrained_restore_hook.py", line 208, in begin
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/hooks/pretrained_restore_hook.py", line 113, in assign_from_checkpoint
ValueError: Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]

[MaskRCNN] ERROR   : Job finished with an uncaught exception: `FAILURE`
2022-05-18 08:46:27,717 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

#print("To resume training from a checkpoint, simply run the same training script. It will pick up from where it's left.")

#!tao mask_rcnn train -e $SPECS_DIR/maskrcnn_train_resnet50.txt \

#                     -d $USER_EXPERIMENT_DIR/experiment_dir_unpruned\

#                     -k $KEY \

#                     --gpus 1

print('Model for each epoch:')

How about running with resnet50 ?

And how did you download /workspace/tao-experiments/mask_rcnn/pretrained_resnet34/pretrained_instance_segmentation_vresnet34/resnet34.hdf5 ?
Could you share the link?

My mistake. I was using pretrained resnet 50 model for resnet 34. We can close this.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.