I changed the image_size
from (832, 1344) to (416, 672), the new height and width are multiplication of 32 as required. I can train and evaluate without error, however, when I do inference, the following error occurs:
Using TensorFlow backend.
2020-09-15 16:08:05.046215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
[MaskRCNN] INFO : Running inference…
[MaskRCNN] INFO : Loading weights from /workspace/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-50000.tlt
Traceback (most recent call last):
File “/usr/local/bin/tlt-infer”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_infer.py”, line 60, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/inference.py”, line 288, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/inference.py”, line 281, in infer
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/distributed_executer.py”, line 484, in infer
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/scripts/evaluation.py”, line 242, in infer
File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 622, in predict
features, None, ModeKeys.PREDICT, self.config)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 548, in mask_rcnn_model_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 392, in _model_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/mask_rcnn_model.py”, line 121, in build_model_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 136, in init
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 146, in _generate_boxes
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/models/anchors.py”, line 85, in _generate_anchor_boxes
ValueError: input size must be divided by the stride.
When I changed the image_size
back to the default size (832, 1344), the inference then ran without error. Any help solving this issue is appreciated.
I used the following image of TLT: nvcr.io/nvidia/tlt-streamanalytics v2.0_py3
Config related to anchor:
min_level: 2
max_level: 6
num_scales: 1
aspect_ratios: “[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]”
anchor_scale: 8
If I understand the doc correctly: the min of image-size should be: base * 2 ^ max_level = 8 * 2 ^ 6 = 512, since I set image_size to (416, 672) and 416 < 512, that could be why I get error during inference, but if that’s the case, why did the training & evaluation run without any error?