Hello,
I’m setting up a pipeline to train mask2former in TAO, and since recommended by Nvidia as well as seeing performance boosts, I want to include a tiling preprocessor. However, since Mask2former requires COCO jsons (segmentation + bbox) instead of segmentation mask pngs, I need to write logic to tile the images as well as the segmentations and bounding boxes associated with each tile. Do you happen to have a preprocessor that achieves this and keeps the integrity of the annotations? I’m finding there are many edge cases that need to be accounted for making the logic complex.
A follow up question, how much does the bounding box accuracy matter for instance segmentation training in mask2former? If I’m tiling annotations and the resulting bounding boxes sometime trace my tile/slice edges, will that make a difference in the model performance? My alternative is to compute the bounding boxes from the rasterized segmentation mask binaries which is very expensive.