Exception on MaskRcnn with different size on TltExport

dannykario · December 9, 2020, 11:45am

Hi,

I trained the mask_rcnn on the default notebook, with the input size 832x1344 all fine.
Then, I changed only the “image_size” param to 320x448 (both factors of 64, as guided) and retrained - looks ok.
BUT
when I issue the tlt-export command:

tlt-export mask_rcnn -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/model.step-58500.tlt -k $KEY -e $SPECS_DIR/maskrcnn_train_resnet50.txt --batch_size 1 --data_type int8 --cal_image_dir $DATA_DOWNLOAD_DIR/raw-data/val2017 --batches 10 --cal_cache_file $USER_EXPERIMENT_DIR/export/maskrcnn.cal --cal_data_file $USER_EXPERIMENT_DIR/export/maskrcnn.tensorfile

I get:
… output that looks ok …

Marking ['generate_detections', 'mask_head/mask_fcn_logits/BiasAdd'] as outputs
2020-12-09 11:04:21,971 [INFO] iva.mask_rcnn.export.exporter: Converted model was saved into /workspace/tlt-experiments/maskrcnn/experiment_dir_unpruned/model.step-58500.etlt
8it [00:00, 16.23it/s]
Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 185, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 263, in run_export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/mask_rcnn/export/exporter.py", line 528, in export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 206, in get_calibrator
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 317, in generate_tensor_file
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 380, in prepare_chunk
ValueError: operands could not be broadcast together with shapes (320,448) (3,) (320,448)

And only the maskrcnn.tensorfile gets created, while the .engine and the .cal files are not created.

Any ideas ?

Thanks for the help !

Morganh · December 9, 2020, 1:17pm

Can you share your $SPECS_DIR/maskrcnn_train_resnet50.txt ?

dannykario · December 9, 2020, 1:22pm

Hi,

Attached. maskrcnn_train_resnet50.txt (2.0 KB)

Morganh · December 9, 2020, 1:29pm

See https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/creating_experiment_spec.html#data-config,

Image dimension as a tuple within quote marks. “(height, width)” indicates the dimension of the resized and padded input

The default notebook is training 1344x832 model instead of 832x1344 model.
According to your spec, your model for the same reason is a 448x320 model.
Please check if it matches your requirement.

dannykario · December 9, 2020, 1:40pm

Hi,

Sorry, buy I’m not following. My goal is to train and infer on images with width 448 and height 320.
I was under the impression that I just need to change the “image_size” param. , then train and export.

Is this the case ?

If not - what else should I change ? In particular:
– Do I have to change the actual input images for the training ?
– Can I use the checkpoint model (resnet50.hdf5) as a starting point ?

Just to clarify - I trained on my modified config file (with the 320x448, and looked at the script that creates the TFRecords and haven’t found any reference to images size.

Thanks for the help !

Morganh · December 9, 2020, 2:24pm

No, you need not to resize your dataset
Yes, you can use the pretrained model.

I suggest you to narrow down.
a) reading maskrcnn blog https://developer.nvidia.com/blog/training-instance-segmentation-models-using-maskrcnn-on-the-transfer-learning-toolkit/
b) running the default jupyter notebook (but, set the image_size: “(320, 448)” and set about 1000 total_steps, and learning_rate_steps: “[100, 200, 600]”)

To see if there is the same problem.

dannykario · December 10, 2020, 5:05am

Hi,

Thanks, it was a glitch on my side.

But now I have another question:
I trained on 100K iterations on the smaller resolution, and results are really poor. Are there any references file of which params, including the amount of iterations, so the model can give reasonable results ?

Even a sample reference file for the 832x1344 resolution, which gives reasonable results, will be highly appreciated.

(Reasonable results, for example, would be giving Bounding Boxes of more or less the same accuracy as ResNet18 in your sample models on a given image, for common objects (car,person etc), and reasonable masks for these Bounding Boxes).

Thanks for the help !

Morganh · December 10, 2020, 5:53am

For the AP result, please refer to the spec at Poor metric results after retraining maskrcnn using TLT notebook - #16 by ghazni ,
it can give reasonable results.

dannykario · December 12, 2020, 12:28pm

Thanks.

Topic		Replies	Views
Error running MaskRCNN inference after custom training TAO Toolkit	8	954	October 12, 2021
Mask R-CNN integration on Jetson Xavier AGX TAO Toolkit	19	745	July 27, 2022
[Mask RCNN] How to change the resolution of the mask? TAO Toolkit	9	1663	December 24, 2021
Error exporting MaskRCNN model TAO Toolkit	4	625	October 12, 2021
Retraining Error after pruning the Mask RCNN model with TAO Toolkit TAO Toolkit tao	5	509	May 10, 2022
MaskRCNN Input to reshape is a tensor with 3135248 values, but the requested shape has 2691200 TAO Toolkit	38	1124	May 9, 2023
Interpreting output of MaskRCNN from TLT to TRT TAO Toolkit tensorrt	7	1686	October 9, 2021
Poor performance of MaskRCNN on images TAO Toolkit	16	1331	October 12, 2021
Maskrcnn.ipynb - followed notebook and ended up with poor (almost untrained) network from instructions TAO Toolkit	13	760	October 12, 2021
MaskRCNN engine generates poor results when changing the number of anchor aspect_ratios TAO Toolkit nvbugs	15	881	January 17, 2022

Exception on MaskRcnn with different size on TltExport

Related topics