I’m trying to interpret the output that TensorRT gives but i can’t find any info. Here is the script i’m using to run inference.
Currently the model has been trained for inputs of shape 256,448 and i’m getting 2 outputs of dimensiones (1,600) and (1,156800)
After some experimentation i was able to understand that the first 6 values of the output of (1,600) were the bounding box, class and confidence of an object of the image, but the rest were just zeros and -1 in class (Does this mean that i can just do inference on 100 objects on an image?)
I really don’t know where the mask is on the other output, i know it’s divisible by 448, but not by 256.
I’m on a Jetson Xavier NX running jetpack 4.5.1 with TensorRT 7.1.3
Any help would be appreciated.
Update:
I resized the first result to (100,6) and got each bounding box of the image (But are not correct)
I also resized the mask to (100,2,28,28) I understand that 28x28 is the output as default in the config of the tlt model, but the bounding box isn’t correct, there seems to be an issue there.
Do i need to run a sigmoid on the output of the mask? That is the only way i saw i got reasonable results on the mask (values oscilate between -10 and 10 aproximatelly. maskrcnn_infer.py (3.7 KB)
We are evading the use of deepstream on this project currently due to it’s steep learning curve.
In the meantime we have already deployed other models to trt like yolo successfully, the main issue is maskrcnn from TLT.
We just tested the maskrcnn converted to .trt from “tlt mask_rcnn export” with the inference option that you gave us and we are getting different results, the tlt retrained model has higher accuracy than the .engine exported (This on the desktop computer, on the jetson i tested copying the pre processing and post processing from other maskrcnn libraries, as seen on the uploaded file below and got even worse results) maskrcnn_infer.py (7.0 KB)
For a given image, keep its aspect ratio and rescale the image to make it the largest rectangle to be bounded by the rectangle specified by the target_size .
Pad the rescaled image such that the height and width of the image become the smallest multiple of the stride that is larger or equal to the desired output dimension.
As mentioned above, will scale pixels between 0 and 1 and then will normalize each channel