Lower FPS compared to the unpruned model for the pruned MaskRCNN model

Please use mask2former as mentioned in TAO5.5. mask2former - NVIDIA Docs.
You can get started with notebook or TAO user guide.
It has higher accuracy. Also some fps can be found in Mask2Former | NVIDIA NGC. You can train a model and run test.
For mask_rcnn, you can check if smaller backbone can help.

I cannot find the input size used for testing the inference FPS of the model for mask2former Mask2Former | NVIDIA NGC.
For mask_rcnn I already tested with ‘resnet18’ backbone, the accuracy is comparable to ‘resnet50’ but surprisingly the FPS was lower for the ‘resnet18’ model, the model training was done on tao3.22.05, I will run the training on the tao5.x container and check the results.

It is 800x800. You can find this info after checking the onnx file with Netron.

Yes, please have a try on tao5.x.
BTW, if you have bandwidth, you can run some benchmark against the tensort engine to profile the cost time of each layer, such as RoIAlign, etc.

can you guide me to any resource where I can learn to use this.

You can refer to utility tools – TREx, TensorRT engine explorer for profiling analysis
1)Exploring NVIDIA TensorRT Engines with TREx | NVIDIA Technical Blog,
2)TensorRT/tools/experimental/trt-engine-explorer at main · NVIDIA/TensorRT · GitHub.

or Performance profiling / analysis / debugging
Nsight Systems (application-level)
Nsight Compute (kernel-level)

I ran training with all 3 resnet backbones, following are the results

maskrcnn v11 unpruned engine fps (fp32) (8.5.3-1+cuda11.8) - 25.9, (int8) (8.5.1-1+cuda11.8) - 35.17, (int8) (8.5.3-1+cuda11.8) - 36.88
maskrcnn v11 pruned engine fps (noint8) (8.5.3-1+cuda11.8) - 29.03, (int8) (8.5.1-1+cuda11.8) - 35.48, (int8) (8.5.3-1+cuda11.8) - 37.01
maskrcnn v11 double_pruned engine fps(int8) (8.5.3-1+cuda11.8) - 37.46
maskrcnn v12 unpruned (int8) (8.5.3-1+cuda11.8) - 37.07
maskrcnn v12 pruned (int8) (8.5.3-1+cuda11.8) - 36.94
maskrcnn v13 unpruned (int8) (8.5.3-1+cuda11.8) - 35.75
maskrcnn v13 pruned (int8) (8.5.3-1+cuda11.8) - 35.144

v11, v12, v13 corresponds to resnet50, resnet18 and resnet10 respectively. Above data shows you that backbones does’nt really help with improving fps. There is a clear improvement between fp32 and int8, but pruning does’nt help and changing the backbone does’nt help. These results are surprising because a smaller backbone and pruning both results in a smaller model with less number of params.

Thanks for the info. It shows that there are other parts which cost majority of the time. So, the smaller model or less params does not play the important role.