MaskRCNN on Xavier - UffParser: Validator error Unsupported operation _GenerateDetection_TRT

Hello!

I am trying to get semantic segmentation working on Xavier using the NGC TLT docker container.

I have Jetpack 4.4 installed with Deepstream 5.0 and tensorrt 7.1.3.

On my desktop computer I run this docker image:
nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

I go to the examples/maskrcnn directory and run through the example Jupyter notebook. I train on COCO using all the defaults. I stopped at 5000 iterations just to test the network. When I run the inference test in the notebook, it works fine.

I convert the model engine file to an etlt file using the code in the notebook. I did the conversion for both 32 bit and 8 bit, in separate files.

I copy the model files to my jetson and the 8 bit calibration file.

I get the same error whether I run from deepstream or the latest tlt-converter. Here is the tlt-converter command and output for the 8-bit model:

/home/nvidia/tlt-converter-7.1-dla/tlt-converter -k MYKEY -d 3,832,1344 -o generate_detections,mask_head/mask_fcn_logits/BiasAdd -c maskrcnn.cal -e trt.int8.engine -b 8 -m 1 -t int8 -i nchw model.step-5000.etlt

[ERROR] UffParser: Validator error: generate_detections: Unsupported operation _GenerateDetection_TRT
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
Segmentation fault (core dumped)

I get the same error from deepstream:
deepstream-app -c deepstream_app_source1_mrcnn.txt

Using winsys: x11 
0:00:00.178862702  9399     0x201ab260 INFO                 nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1715> [UID = 1]: Trying to create engine from model files
ERROR: [TRT]: UffParser: Validator error: generate_detections: Unsupported operation _GenerateDetection_TRT
parseModel: Failed to parse UFF model
ERROR: failed to build network since parsing model errors.
ERROR: Failed to create network using custom network creation function
ERROR: Failed to get cuda engine from custom library API
0:00:02.100517302  9399     0x201ab260 ERROR                nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1735> [UID = 1]: build engine file failed
Segmentation fault (core dumped)

As I said I am using the example config during training, but here is that file for easy reference:

seed: 123
use_amp: False
warmup_steps: 1000
checkpoint: "/workspace/tlt-experiments/maskrcnn/pretrained_resnet50/tlt_instance_segmentation_vresnet50/resnet50.hdf5"
learning_rate_steps: "[10000, 15000, 20000]"
learning_rate_decay_levels: "[0.1, 0.02, 0.01]"
total_steps: 25000
train_batch_size: 2
eval_batch_size: 4
num_steps_per_eval: 5000
momentum: 0.9
l2_weight_decay: 0.0001
warmup_learning_rate: 0.0001
init_learning_rate: 0.01

data_config{
    image_size: "(832, 1344)"
    augment_input_data: True
    eval_samples: 500
    training_file_pattern: "/workspace/tlt-experiments/data/train*.tfrecord"
    validation_file_pattern: "/workspace/tlt-experiments/data/val*.tfrecord"
    val_json_file: "/workspace/tlt-experiments/data/annotations/instances_val2017.json"

    # dataset specific parameters
    num_classes: 91
    skip_crowd_during_training: True
}

maskrcnn_config {
    nlayers: 50
    arch: "resnet"
    freeze_bn: True
    freeze_blocks: "[0,1]"
    gt_mask_size: 112
        
    # Region Proposal Network
    rpn_positive_overlap: 0.7
    rpn_negative_overlap: 0.3
    rpn_batch_size_per_im: 256
    rpn_fg_fraction: 0.5
    rpn_min_size: 0.

    # Proposal layer.
    batch_size_per_im: 512
    fg_fraction: 0.25
    fg_thresh: 0.5
    bg_thresh_hi: 0.5
    bg_thresh_lo: 0.

    # Faster-RCNN heads.
    fast_rcnn_mlp_head_dim: 1024
    bbox_reg_weights: "(10., 10., 5., 5.)"

    # Mask-RCNN heads.
    include_mask: True
    mrcnn_resolution: 28

    # training
    train_rpn_pre_nms_topn: 2000
    train_rpn_post_nms_topn: 1000
    train_rpn_nms_threshold: 0.7

    # evaluation
    test_detections_per_image: 100
    test_nms: 0.5
    test_rpn_pre_nms_topn: 1000
    test_rpn_post_nms_topn: 1000
    test_rpn_nms_thresh: 0.7

    # model architecture
    min_level: 2
    max_level: 6
    num_scales: 1
    aspect_ratios: "[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]"
    anchor_scale: 8

    # localization loss
    rpn_box_loss_weight: 1.0
    fast_rcnn_box_loss_weight: 1.0
    mrcnn_weight_loss_mask: 1.0
}

I don’t know enough about this stuff to have a good idea how to solve it. Any ideas?

Thank you!

Where did you download the /home/nvidia/tlt-converter-7.1-dla/tlt-converter?
Can you paste the link?

Yes I found it here:

which was linked here:

Of course the same error happens with deepstream, which isn’t using that binary (I didn’t add that binary to my path or anything).

Thanks!

Please try below version of tlt-converter-7.1
https://developer.nvidia.com/tlt-converter-trt71

Thank you. I just tried it and I get the same error message as before.

/home/nvidia/Downloads/tlt-converter -k MY_KEY -d 3,832,1344 -o generate_detections,mask_head/mask_fcn_logits/BiasAdd -c maskrcnn.cal -e trt.int8.engine -b 8 -m 1 -t int8 -i nchw model.step-5000.etlt
[ERROR] UffParser: Validator error: generate_detections: Unsupported operation _GenerateDetection_TRT
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] Network must have at least one output
[ERROR] Network validation failed.
[ERROR] Unable to create engine
Segmentation fault (core dumped)

I re-flashed my Jetson just to make sure. I flashed it with Jetpack 4.4 and Deepstream 5.0 as before. I get the same error message when I run tlt-converter as above, using the tlt-converter binary linked in your post above.

Can you try below experiments?

  1. try to generate fp16 trt engine instead of int8 engine
  2. try to follow GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream
### 2. Download Models
cd deepstream_tlt_apps/
wget https://nvidia.box.com/shared/static/8k0zpe9gq837wsr0acoy4oh3fdf476gq.zip -O models.zip
unzip models.zip
rm models.zip

then use its mrcnn model instead to try again.

1 Like

Thank you.

I tried to generate fp16 on my desktop, but it said unsupported. Perhaps you meant on the Jetson.

However I moved to step 2. I downloaded the models, and copied the maskrcnn model from there to the Jetson. I tried to convert the model after updating the key. I then got the same “Unsupported Operation” error as before, but with MultilevelProposeROI instead of GenerateDetection_TRT. I searched for MultilevelProposeROI, and found this document:
https://docs.nvidia.com/metropolis/TLT/pdf/Transfer-Learning-Toolkit-Getting-Started-Guide-IVA.pdf

I searched for MultilevelProposeROI and found the following sentence in that PDF:

TensorRT OSS build is required for FasterRCNN, SSD, DSSD, YOLOv3, RetinaNet, and MaskRCNN models.

WOW. That’s not what the deepstream_tlt_apps github repo says here:

Instead that README says:

Prerequisites: TensorRT OSS (release/7.x branch) This is ONLY needed when running SSD , DSSD , RetinaNet and YOLOV3 models because BatchTilePlugin required by these models is not supported by TensorRT7.x native package.

So because I saw that TensorRT OSS is only needed for SSD, DSSD, RetinaNet, and YOLOV3, I believed it was not needed for MaskRCNN. Furthermore, I did see a binary for TensorRT OSS but that said version 7.0.0 and my jetson has 7.1.3, so I worried it was not compatible anyway. (however the binary I built from source also says 7.0.0 and it works)

However, now my problem appears solved. I can convert the models on my Jetson (both the one downloaded from box.com and the one from the NGC transfer learning docker container that I trained on COCO).

It seems that the documentation on deepstream_tlt_apps should be updated. I read that multiple times before and if it said “MaskRCNN” I would have tried it.

The jupyter notebook in the NGC also does not mention that the Jetson needs TensorRT OSS, so that file should be updated. It is in the container nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3 at “notebooks/examples/maskrcnn/maskrcnn.ipynb”.

Thank you for helping narrow down the issue.

Thanks for your info. Appreciate it!
I will sync with internal team to get document improved.

1 Like

FYI I am having a related problem where I cannot run MaskRCNN in the deepstream NGC container because it does not have TensorRT OSS, and getting that to compile in the container is running me up against a few road blocks. I need CUDA but installing the cuda package from apt gave me trouble. Also the version of cmake available on that apt system is too old, though installing cmake from source is easy enough. I am still working through this, but the need to compile TensorRT OSS is a pain to say the least. I will probably get this working, I just wanted to share this info!

Which deepstream NGC container did you use? If it is blocking you, could you search help from Deepstream forum or google?

Thanks. I was using the 5.0.1-20.09-devel deepstream container.

I never got compiling in that container to work (after trying for a while). Instead I noticed that the TensorRT-OSS instructions include their own container to compile inside of (I did not need to do this previously on my Xavier). I compiled this time in the recommended container, and then copied the resulting binaries in to the right location in my deepstream container. It took a fair bit of trial and error before I finally got the right settings (must match TensorRT OSS git tag, CUDA version, and Ubuntu version with the deepstream container), but ultimately that worked for me. Now I can validate my deepstream networks on video samples on my desktop, no need to involve the Xavier all the time.

It did take me working all through a day to get it all sorted out. This seems like it could hamper anyone trying the MaskRCNN with deepstream! Happy its working now but I was pretty frustrating to work through.

1 Like