Interpreting output of MaskRCNN from TLT to TRT

Hello! I’m trying to run inference on a trt generated engine from the tutorial of TLT MaskRCNN from here:

I’m trying to interpret the output that TensorRT gives but i can’t find any info. Here is the script i’m using to run inference.

Currently the model has been trained for inputs of shape 256,448 and i’m getting 2 outputs of dimensiones (1,600) and (1,156800)

After some experimentation i was able to understand that the first 6 values of the output of (1,600) were the bounding box, class and confidence of an object of the image, but the rest were just zeros and -1 in class (Does this mean that i can just do inference on 100 objects on an image?)

I really don’t know where the mask is on the other output, i know it’s divisible by 448, but not by 256.

I’m on a Jetson Xavier NX running jetpack 4.5.1 with TensorRT 7.1.3

Any help would be appreciated.

I resized the first result to (100,6) and got each bounding box of the image (But are not correct)
I also resized the mask to (100,2,28,28) I understand that 28x28 is the output as default in the config of the tlt model, but the bounding box isn’t correct, there seems to be an issue there.
Do i need to run a sigmoid on the output of the mask? That is the only way i saw i got reasonable results on the mask (values oscilate between -10 and 10 aproximatelly. (3.7 KB)

Firstly, can you run tlt mask_rcnn inference well against your trt engine?

Mmm i did all the training with tlt on a desktop GPU and then built the engine on a jetson. Should i install TLT on the jetson to test the engine?

For running inference in jetson device, you can try to use deepstream. TLT officially release a github for it.
See GitHub - NVIDIA-AI-IOT/deepstream_tlt_apps: Sample apps to demonstrate how to deploy models trained with TLT on DeepStream, peoplesegnet is actually based on maskrcnn network, see PeopleSegNet — Transfer Learning Toolkit 3.0 documentation .
So, you can try to run with similar spec in peoplesegnet.

We are evading the use of deepstream on this project currently due to it’s steep learning curve.
In the meantime we have already deployed other models to trt like yolo successfully, the main issue is maskrcnn from TLT.

We just tested the maskrcnn converted to .trt from “tlt mask_rcnn export” with the inference option that you gave us and we are getting different results, the tlt retrained model has higher accuracy than the .engine exported (This on the desktop computer, on the jetson i tested copying the pre processing and post processing from other maskrcnn libraries, as seen on the uploaded file below and got even worse results) (7.0 KB)

See TLT different results - #9 by Morganh
So, please modify the preprocessing according to the hint in deepstream_tlt_apps/pgie_peopleSegNetv2_tlt_config.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tlt_apps · GitHub


Similar to keras-applications/ at master · keras-team/keras-applications · GitHub
if mode == ‘torch’:
x /= 255.
mean = [0.485, 0.456, 0.406]
std = [0.224, 0.224, 0.224]

Here are the preprocessing steps in TLT.

  1. For a given image, keep its aspect ratio and rescale the image to make it the largest rectangle to be bounded by the rectangle specified by the target_size .
  2. Pad the rescaled image such that the height and width of the image become the smallest multiple of the stride that is larger or equal to the desired output dimension.
  3. As mentioned above, will scale pixels between 0 and 1 and then will normalize each channel

Refer to:

and Discrepancy between results from tlt-infer and trt engine - #8 by Morganh, change to inference_input = preprocess_input(inf_img.transpose(2, 0, 1), mode="torch")

1 Like

Thanks Morganh, We will test these changes on monday and get back to you.

Have a nice weekend.