Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]

edit_or · May 18, 2022, 1:19am

• Hardware (Training on system with NVIDIA TITAN RTX(24G), Precision 7920 Tower with 32G memory)
• Network Type (Mask_rcnn using mapillary-vistas-dataset)

I am training maskrcnn with resnet34 architecture.
I have error as
Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]
What could be issue?
Attached is the config file for training.
maskrcnn_train_resnet34.txt (2.0 KB)
The whole error is

INFO:tensorflow:Done calling model_fn.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 222, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 218, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/train.py", line 85, in run_executer
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/executer/distributed_executer.py", line 399, in train_and_eval
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1490, in _train_with_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 584, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1014, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 713, in __init__
    h.begin()
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/hooks/pretrained_restore_hook.py", line 208, in begin
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/hooks/pretrained_restore_hook.py", line 113, in assign_from_checkpoint
ValueError: Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]

[MaskRCNN] ERROR   : Job finished with an uncaught exception: `FAILURE`
2022-05-18 08:46:27,717 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

#print("To resume training from a checkpoint, simply run the same training script. It will pick up from where it's left.")

#!tao mask_rcnn train -e $SPECS_DIR/maskrcnn_train_resnet50.txt \

#                     -d $USER_EXPERIMENT_DIR/experiment_dir_unpruned\

#                     -k $KEY \

#                     --gpus 1

print('Model for each epoch:')

Morganh · May 18, 2022, 2:49am

How about running with resnet50 ?

And how did you download /workspace/tao-experiments/mask_rcnn/pretrained_resnet34/pretrained_instance_segmentation_vresnet34/resnet34.hdf5 ?
Could you share the link?

edit_or · May 18, 2022, 3:42am

My mistake. I was using pretrained resnet 50 model for resnet 34. We can close this.

system · June 1, 2022, 3:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error running MaskRCNN inference after custom training TAO Toolkit	8	950	October 12, 2021
Exception on MaskRcnn with different size on TltExport TAO Toolkit	9	707	October 12, 2021
TAO toolkit train efficientNet error TAO Toolkit	4	479	October 12, 2021
LPRNet issue while training using custom data TAO Toolkit	3	994	December 28, 2021
Error while training with higher resolution images in yolo_v4 TLT-V3 TAO Toolkit	7	534	October 12, 2021
ValueError: Total size of new array must be unchanged for box_head/class-predict/kernel lh_shape: [(1024, 1)], rh_shape: [(1024, 2)] TAO Toolkit	7	946	October 12, 2021
Error trying to train a model on small dataset TAO Toolkit	2	424	November 15, 2022
No such file or directory error when trying to train TAO Unet even though it exists TAO Toolkit	3	456	July 18, 2022
Preprocessing crop parameters lead to null output dim(s) TAO Toolkit	6	422	June 10, 2022
Get error when training lprnet with TLT3.0 lancher TAO Toolkit	7	540	October 12, 2021

Total size of new array must be unchanged for block_1a_conv_1/kernel lh_shape: [(1, 1, 64, 64)], rh_shape: [(3, 3, 64, 64)]

Related topics