Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit

Originally published at:

To convert pixels to actionable insights, computer vision relies on deep learning to provide an understanding of the environment. Object detection is a commonly used technique to identify individual objects in a frame such as to identify people or cars. While object detection is beneficial for certain applications, it falls short when you want to…


I’ve been experimenting around with the mask rcnn model, but have some questions regarding the achievable performance:


In the post it shows it’s possible to achieve 15+ FPS using a Resnet50 backbone.

However, I’ve been trying to replicate the results (using the same model, backbone and dataset) and so far have been able to achieve ~7FPS (measured with trtexec). What am I missing?


1 Like

Following the guide to do Mask R-CNN training had an error for ModuleNotFound No module named third party with references to user vpraveen. Others are having the same error with details at TLT V2.0 Classification

Given the nature of running this from a provided nvidia docker container not sure what to resolve.

Any suggestions?