Mask R-CNN integration on Jetson Xavier AGX

Description

Hi,
We encounter some difficulties converting a mask-rcnn model, retrained with tao-toolkit v3.22.05 (on a lambda server) and exported into .etlt format, to engine for use with deepstream 6.0. The backbone was resnet50.

Environment

TensorRT Version: 8.0.1.6
GPU Type: Jetson Xavier AGX 16Gb
Nvidia Driver Version: L4T 32.6.1
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Jetpack 4.6
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Steps To Reproduce

The model was trained with input dims of 3x768x1280.
After following this tutorial : MaskRCNN — TAO Toolkit 3.22.05 documentation, (TRT OSS downloaded, libnvinfer_plugin.so.8.0.1 substituted, …), we encounter an error when converting the model from etlt to .trt format:

$ ./tao-converter -k tlt_encode -d 3,768,1280 -o generate_detections,mask_fcn_logits/BiasAdd /home/custom_model.etlt

[ERROR ] 3: fc6/MatMul: kernel weights has count 67108864 but 12845056 was expected
[ERROR ] 3: fc6/MatMul: kernel weights has count 67108864 but 12845056 was expected
[ERROR ] 3: fc6/MatMul: kernel weights has count 67108864 but 12845056 was expected
[ERROR ] UffParser: Parser error: fc6/BiasAdd: The input to the scale layer is required to have a minimum of 3 dimensions

This same error appears when using the tao-converter tool or when the etlt model is directly integrated into Deepstream.

Can you help us solve any of these problems ? Do you have any idea of the source of the problem ?

Looking forward to your feedback

Hi,

This looks like a TAO Toolkit related issue. We will move this post to the TAO Toolkit forum.

Thanks!

1 Like

Could you please share the training spec file?

Hi,
Thank you for your quick response, I attach you the training spec file.

spec_mrcnn_v1.txt (2.4 KB)

How did you generate the custom_model.etlt ? Did you ever save the log?
Or if you run all the steps in jupyter notebook, you can share the .ipynb file with me.

I did not use jupyter but the terminal, and I no longer have access to the log file
I used this command line to export the model:

tao mask_rcnn export -m model-step-16120.tlt -k tlt_encode --gen_ds_config -o custom_model.etlt

Please try to follow below and generate .etlt again.

!tao mask_rcnn export -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/model.step-$NUM_STEP.tlt
-k $KEY
-e $SPECS_DIR/maskrcnn_retrain_resnet50.txt
--batch_size 1
--data_type fp32
--engine_file $USER_EXPERIMENT_DIR/export/model.step-$NUM_STEP.engine

Thank you, I will try !

So, I am able to create the etlt file, but the same error occurs when converting from etll to engine.

Here is the command line I used:

tao mask_rcnn export -m /workspace/tao-experiments/mask_rcnn/model.step-16120.tlt -k tlt_encode -e /workspace/tao-experiments/specs/spec_mrcnn.txt --batch_size 1 --data_type fp32 --engine_file /workspace/tao-experiments/mask_rcnn/model.step-16120.engine --log_file /workspace/tao-experiments/mask_rcnn/log_v2

It gave me this log file:
log_v2 (83.2 KB)

Then I tried to follow the mask_rcnn notebook obtained with:

wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.4.0/zip -O cv_samples_v1.4.0.zip

I retrained the resnet50.hdf5 for a single epoch, and got the same output :
log_export (83.2 KB)

Where did you download below file?
checkpoint: "/workspace/tao-experiments/specs/resnet50.hdf5"

From ngc with :

# Pull pretrained model from NGC
ngc registry model download-version nvidia/tao/pretrained_instance_segmentation:resnet50 --dest $LOCAL_EXPERIMENT_DIR/pretrained_resnet50

To narrow down, please run with the default notebook and its spec file. To check if it works. Thanks.

Okay, thanks! I’ll give it a try and let you know as soon as I get some results

Please update the result once you have, thanks.

So I was able to run the default notebook without any error. Then I compared the default spec file and mine. I found in my spec file that the mrcnn_resolution parameter was set to 64 whereas it was 28 in the original spec file. So I changed it to 28 and I was able to export the model in engine format.
However I don’t know what influence this may have on the quality of the results.

I can reproduce your error result with mrcnn_resolution:64 . Will check further and update to you.

1 Like

More, could you ever train with “mrcnn_resolution:28” and what is the difference in final mAP between “mrcnn_resolution:64” and with “mrcnn_resolution:28” ?

We will enable adaptive export for mrcnn_resolution. The fix will be available in next release.

OK, thank you very much for your time and help

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.