Hello! I’m trying to run inference on a trt generated engine from the tutorial of TLT MaskRCNN from here:
I’m trying to interpret the output that TensorRT gives but i can’t find any info. Here is the script i’m using to run inference.
Currently the model has been trained for inputs of shape 256,448 and i’m getting 2 outputs of dimensiones (1,600) and (1,156800)
After some experimentation i was able to understand that the first 6 values of the output of (1,600) were the bounding box, class and confidence of an object of the image, but the rest were just zeros and -1 in class (Does this mean that i can just do inference on 100 objects on an image?)
I really don’t know where the mask is on the other output, i know it’s divisible by 448, but not by 256.
I’m on a Jetson Xavier NX running jetpack 4.5.1 with TensorRT 7.1.3
Any help would be appreciated.
I resized the first result to (100,6) and got each bounding box of the image (But are not correct)
I also resized the mask to (100,2,28,28) I understand that 28x28 is the output as default in the config of the tlt model, but the bounding box isn’t correct, there seems to be an issue there.
Do i need to run a sigmoid on the output of the mask? That is the only way i saw i got reasonable results on the mask (values oscilate between -10 and 10 aproximatelly.
maskrcnn_infer.py (3.7 KB)