Error running MaskRCNN inference after custom training

I changed the image_size from (832, 1344) to (416, 672), the new height and width are multiplication of 32 as required. I can train and evaluate without error, however, when I do inference, the following error occurs:

Using TensorFlow backend.
2020-09-15 16:08:05.046215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[MaskRCNN] INFO : Running inference…
[MaskRCNN] INFO : Loading weights from /workspace/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-50000.tlt
Traceback (most recent call last):
File “/usr/local/bin/tlt-infer”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_infer.py”, line 60, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/inference.py”, line 288, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/inference.py”, line 281, in infer
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/distributed_executer.py”, line 484, in infer
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 242, in infer
File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 622, in predict
features, None, ModeKeys.PREDICT, self.config)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 548, in mask_rcnn_model_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 392, in _model_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 121, in build_model_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 136, in init
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 146, in _generate_boxes
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 85, in _generate_anchor_boxes
ValueError: input size must be divided by the stride.

When I changed the image_size back to the default size (832, 1344), the inference then ran without error. Any help solving this issue is appreciated.

I used the following image of TLT: nvcr.io/nvidia/tlt-streamanalytics v2.0_py3
Config related to anchor:

min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: “[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]”
anchor_scale: 8

If I understand the doc correctly: the min of image-size should be: base * 2 ^ max_level = 8 * 2 ^ 6 = 512, since I set image_size to (416, 672) and 416 < 512, that could be why I get error during inference, but if that’s the case, why did the training & evaluation run without any error?

I am checking if I can reproduce.

1 Like

Thank you Morganh, can’t wait to hear back from you.

I can reproduce on my side. Will sync with internal team.

2 Likes

Thank you for the update. Eagerly waiting for your next update.

Sorry for late reply.This issue will be fixed in next release.

Hi @Morganh, thank you for the update. At this moment, would you recommend keep using the default image_size for train, evaluation and inference of maskRCNN model? Do you have a rough estimation of when the next release is?

Yes, you can use the default image_size. I’m afraid you can also change width or height to multiplication of 64. It is expected to work for current release.
For next release, it is expected to release within this year.

1 Like