Accuracy and mIoU of 1.0 when validating Mask2Former

t.roossink1 · April 3, 2025, 3:01pm

Hardware: RTX3080Ti
Network: Mask2Former
Docker image: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt

Spec file for training and validation:
exp_mask2former.txt (2.1 KB)

Issue:
During validation of the Mask2Former, the accuracy and mIoU metrics are always 1.0. This is obviously incorrect and should be lower. The issue occurs when validating on coco panoptic as well as coco instance annotations.

Troubleshooting:
Taking a look at the source code of the TAO pytorch backend, it looks like the dataset classes (tao_pytorch_backend/nvidia_tao_pytorch/cv/mask2former/dataloader/datasets.py at main · NVIDIA/tao_pytorch_backend · GitHub) used for coco always convert the segmentations to a semantic segmentation map.

Also, the predicted segmentation map passed to calculate the evaluation metrics always seem to be 0 in the validation_step() method in the pytorch lightning model (tao_pytorch_backend/nvidia_tao_pytorch/cv/mask2former/model/pl_model.py at main · NVIDIA/tao_pytorch_backend · GitHub).

Is there a way to fix the evaluation for the Mask2Former model for instance segmentation?

Morganh · April 4, 2025, 10:32am

Can you run successfully with default notebook/dataset successfully?
See tao_tutorials/notebooks/tao_launcher_starter_kit/mask2former/mask2former.ipynb at main · NVIDIA/tao_tutorials · GitHub
and
tao_tutorials/notebooks/tao_launcher_starter_kit/mask2former/specs/spec.yaml at main · NVIDIA/tao_tutorials · GitHub.

Please note that there are 2 kinds of notebooks as well. tao_tutorials/notebooks/tao_launcher_starter_kit/mask2former at main · NVIDIA/tao_tutorials · GitHub.

t.roossink1 · April 7, 2025, 12:55pm

Thank you for your reply!

I ran the instance segmentation tutorial notebook (mask2former_inst.ipynb). I changed the batch size and number of workers for training and trained on the validation set to just speed up the process. See the spec file here:
spec_inst.txt (1.6 KB)

From this I got the following metrics:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │    0.3793853521347046     │
│           mIoU            │    0.01788681373000145    │
│         val_loss          │    62.049774169921875     │
└───────────────────────────┴───────────────────────────┘

So it seems like it is working correctly.

I adjusted the tutorial to use my own custom coco dataset, but again I got an accuracy and mIoU of 1.0 as seen below:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │            1.0            │
│           mIoU            │            1.0            │
│         val_loss          │    59.251399993896484     │
└───────────────────────────┴───────────────────────────┘

The spec file used:
spec_inst_apples.txt (1.6 KB)

I then tested some other custom coco datasets available online. From this it seems like the problem only occurs when the number of classes is 1. I adjusted my custom apples dataset such that it had two classes and changed the annotations to have roughly half of both classes. I ran the same training, but now with only the number of classes adjusted:
spec_inst_apples2.txt (1.6 KB)

This produced an accuracy and mIoU that is not 1.0:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │    0.6408777832984924     │
│           mIoU            │    0.3204388916492462     │
│         val_loss          │     70.84131622314453     │
└───────────────────────────┴───────────────────────────┘

Morganh · April 10, 2025, 8:04am

t.roossink1:

I ran the instance segmentation tutorial notebook (mask2former_inst.ipynb). I changed the batch size and number of workers for training and trained on the validation set to just speed up the process. See the spec file here:
spec_inst.txt (1.6 KB)

From this I got the following metrics:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │    0.3793853521347046     │
│           mIoU            │    0.01788681373000145    │
│         val_loss          │    62.049774169921875     │
└───────────────────────────┴───────────────────────────┘

So it seems like it is working correctly.

Please follow the default notebook to train and run inference to confirm it is working. The training epoch is set to 50 by default. Your setting(only train for 1 epoch) is not enough.

t.roossink · April 10, 2025, 8:58am

Hello Morgan,

Thank you for your reply.

I do not care so much about how high the accuracy and the mIoU of the model that is produced by the tutorial notebook. I want to train a model on a custom dataset, but it seems like that the evaluation script does not yield correct results when the num_classes is set to 1 in the config file, because the accuracy and mIoU is always 1.0. As per my previous post, the default notebook does yield correct accuracies and mIoU (although not high). By changing my custom dataset to two classes, the validation does work correctly, however this is not desired.

I am seeking to validate my custom trained model which only has to predict a single class, instead of validating the standard model produced by the default notebook. Can you help me with this?

Morganh · April 14, 2025, 2:41am

Can you enlarge more training epochs to check if it works? I am afraid the training does not converge yet.
I will also check further if it supports only class 1.

Morganh · April 21, 2025, 2:13am

It can support running with only 1 class.
Please ensure “the category ids and annotation ids must be greater than 0.” mask2former - NVIDIA Docs. Thanks.
If possible, please share the minimal dataset to us to reproduce as well.

t.roossink1 · April 29, 2025, 9:55am

Thank you for the reply.

I have made sure that the annotation and category ids are both 1 for my single class dataset, but the problem still persists.

Unfortunately, I cannot share my dataset. However, I am able to reproduce the result on the COCO dataset. I have converted the annotations of the COCO val2017 dataset to only contain one and two classes. Here are the zip files containing the json files for the annotations:
val_single_class.zip (3.1 MB)
val_double_class.zip (3.4 MB)

And here are the configuration files used:
exp_single_class.zip (961 Bytes)
exp_double_class.zip (961 Bytes)
colormap.zip (300 Bytes)

The experiment configuration files are set up to both train and validate for a single epoch on the COCO val2017 dataset using the available pretrained model. When training, the validation at the end of the epoch shows an accuracy and mIoU of 1.0 on the single class dataset and an accuracy of 0.991 and mIoU of 0.934 on the double class dataset.

Note that you have to download the pretrained model and the COCO val2017 images. These are available here. You also have to set the filepaths/folderpaths in the config files to match your workspace setup.

Please let me know if you are able to reproduce the result or if anything is unclear.

Morganh · May 1, 2025, 12:22pm

Thanks for the info. I will try to reproduce. How many epoch did you set? Could you please share the full log if it is available? Thanks.

t.roossink1 · May 1, 2025, 2:31pm

I only trained for a single epoch as I took the pretrained model. Here are the logs and experiment files in the output folder:

Single class:
single_class_logs.zip (2.1 KB)

Double class:
double_class_logs.zip (2.1 KB)

Morganh · May 12, 2025, 2:19am

Training for only 1 epoch does not make sense to compare. Also, “an accuracy and mIoU of 1.0” has not much different against "an accuracy of 0.991 and mIoU of 0.934 ". They all imply that the training does not converge yet. The inference result should be similar(almost are wrong). So, I still suggest we to run full training to compare. I am running on my side as well. Will update to you if I have. Thanks.

kianmehr.ehtiatkar2 · July 7, 2025, 2:07pm

Hello, I’m running into the same issue. I’m training Mask2former on my own COCO formatted dataset with 1 class anywhere from 5 to 200 epochs. I’ve tried a variety of different hyperparameters, and all of them result in mIoU of 1 for all epochs which is making me doubt the training progress.
I’m able to train Mask2former using mmdetection with great results; however, trying to replicate the process on TAO, the performance is falling short and has been disappointing. I’ve attached the relative files for your reference if necessary.
labelmap_inst.txt (130 Bytes)
spec_inst.txt (2.3 KB)
train_annotations.txt (3.2 MB)

Epoch 1

Epoch 200

Morganh · July 9, 2025, 9:57am

Could you share the full log?
Did you run the default Mask2former with TAO notebook and get the expected result?
If yes, then the difference should be your custom dataset along with the training spec file.
More hint from Fine-Tune the TAO v5.5.0 Mask2former Instance segmentation model on a custom dataset - #6 by Morganh.

kianmehr.ehtiatkar2 · July 10, 2025, 8:48pm

Unfortunately the shared thread didn’t help with my issue. What log are you looking for exactly, so I can share appropriately? For now I’ve attached the autogenerated experiment yaml file as well as the status file for training.
experiment_yaml.txt (4.8 KB)
status_yaml.txt (69.4 KB)

kianmehr.ehtiatkar2 · July 11, 2025, 4:46pm

@Morganh Digging through the source code, it appears that this line of code has been commented out:

github.com/NVIDIA/tao_pytorch_backend

nvidia_tao_pytorch/cv/mask2former/model/pl_model.py

dc07b02eb


      
                          }
                      )
          
              return panoptic_seg, segments_info
          
          def instance_inference(self, mask_cls, mask_pred):
              """Post process for instance segmentation."""
              # mask_pred is already processed to have the same shape as original input
              image_size = mask_pred.shape[-2:]
              # [Q, K]
              scores = F.softmax(mask_cls, dim=-1)[:, 1:]
              labels = torch.arange(self.num_classes, device=self.device).unsqueeze(0).repeat(self.num_queries, 1).flatten(0, 1)
              scores_per_image, topk_indices = scores.flatten(0, 1).topk(self.test_topk_per_image, sorted=False)
              labels_per_image = labels[topk_indices]
          
              topk_indices = topk_indices // self.num_classes
              mask_pred = mask_pred[topk_indices]
          
              result = Instances(image_size)
              # mask (before sigmoid)
              result.pred_masks = (mask_pred > 0).float()

[:, 1:]drops the real class 0 and keeps only the “no-object” channel. With just one class, the model can now predict only “no-object”, so every pixel it outputs is indexed as 0. During mIoU computation class 0 is compared with the present ground-truth (also indexed 0 after reduce_zero_label=True), giving an intersection = union case or IoU = 1 for every image. At least this is my theory.

However, making the changes (:1 to :-1) inside the container and testing it from within the container made no difference:

Testing DataLoader 0: 100%|█████████████████████████████████████████████████| 20/20 [00:03<00:00,  5.59it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │            1.0            │
│           mIoU            │            1.0            │
│         val_loss          │    24.135944366455078     │
└───────────────────────────┴───────────────────────────┘

Finally what solved it for me was setting reduce_zero_label to False here:

github.com/NVIDIA/tao_pytorch_backend

nvidia_tao_pytorch/cv/mask2former/model/pl_model.py

dc07b02eb


      
              mask_pred_results = outputs["pred_masks"]  # b, num_queries, h//4, w//4
          
              pred_masks = self.batch_semantic_inference(mask_cls_results, mask_pred_results)  # nclasses, h//4, w//4
              pred_semseg = torch.argmax(pred_masks, axis=1).cpu().numpy()  # h//4, w//4
          
              area_intersect, area_union, area_pred_label, area_label = \
                  total_intersect_over_union(pred_semseg,
                                             segms.cpu().numpy(),
                                             self.num_classes,
                                             ignore_index=2 ** self.n_bits - 1,  # 0 for original
                                             reduce_zero_label=True)  # False for original
              val_metrics = {
                  'val_loss': val_loss,
                  'area_intersect': area_intersect,
                  'area_union': area_union,
                  'area_pred_label': area_pred_label,
                  'area_label': area_label}
              self.validation_outputs.append(val_metrics)
              return val_metrics
          
          def val_epoch_end(self):

Testing DataLoader 0: 100%|█████████████████████████████████████████████████| 20/20 [00:02<00:00,  6.86it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│          all_acc          │    0.9454891085624695     │
│           mIoU            │    0.8966138958930969     │
│         val_loss          │    24.135944366455078     │
└───────────────────────────┴───────────────────────────┘

I found documentation on MMDetection explaining this:

github.com/open-mmlab/mmsegmentation

docs/en/faq.md

master

# Frequently Asked Questions (FAQ)

We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the [provided templates](https://github.com/open-mmlab/mmsegmentation/blob/master/.github/ISSUE_TEMPLATE/error-report.md/) and make sure you fill in all required information in the template.

## Installation

The compatible MMSegmentation and MMCV versions are as below. Please install the correct version of MMCV to avoid installation issues.

| MMSegmentation version |        MMCV version         | MMClassification version |
| :--------------------: | :-------------------------: | :----------------------: |
|         master         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.30.0         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.29.1         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.29.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.28.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.27.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.26.0         | mmcv-full>=1.5.0, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.25.0         | mmcv-full>=1.5.0, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.24.1         | mmcv-full>=1.4.4, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.23.0         | mmcv-full>=1.4.4, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |

This file has been truncated. show original

Hopefully this will be fixed in future releases for binary segmentations.

Morganh · July 12, 2025, 2:29am

Thanks for the info. But when I docker login the nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt and check /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/mask2former/model/pl_model.py.

$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash
root@11f18e6d31af:/opt/nvidia/tools# ls /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/mask2former/model/pl_model.py

The line 637 is not commented out.

632     def instance_inference(self, mask_cls, mask_pred):
633         """Post process for instance segmentation."""
634         # mask_pred is already processed to have the same shape as original input
635         image_size = mask_pred.shape[-2:]
636         # [Q, K]
637         scores = F.softmax(mask_cls, dim=-1)[:, 1:]

Is there any gap here?

kianmehr.ehtiatkar2 · July 15, 2025, 12:50am

Mine is also the same. I’ve been doing more debugging in the container, and it seems that the model is not learning my “foreground” or non-background pixels at all. When I print the unique predicted labels, I only get [0] and not [0, 1] which is what you expect if the model were to predict both foreground and background. I can confirm this by looking at the segmentation mask input to the model and the predict mask:
Input:

Prediction:

This all makes me think the issue is dataset related or how the pipeline maps the labels to indexes and so forth.
I understand the dataloader expects the COCO format, and I’ve reviewed my dataset to be correct, but I’m not sure if I’ve missed something subtle that is resulting in this behavior. Are you able to verify the attached example validation annotation file for accuracy? Things like categories, the segmentation format, etc.
val_annotations.txt (248.1 KB)

p.s. I still think setting reduce_zero_label to False is sensible here as my dataset has only 1 class.
p.s. After changing reduce_zero_label to False I do get a value that is lower than 1; however, it ends up staying the same for all epochs and suddenly dropping for the last epoch. Printing the values during each epoch I get this:

iou [0.9436853 0.       ]
miou 0.47184265

It seems to me that the model is not making any prediction for the second class (which is either the background or foreground in this case), so one value in iou is always 0 now.

kianmehr.ehtiatkar2 · July 15, 2025, 6:31pm

Update: I decided to let the model train for a few more epochs. Strangely, mIoU and accuracy go down as loss values go down

I suppose some good news is that now I can see a mask on prediction; however, another strange finding is that looking at the input segmentation to the model and its prediction, the label values seem to be flipped. Notice how the purple and yellow are flipped in these images.
Input

Prediction

As a reminder, it seems like per mmdetection documentation, for binary segmentations a few changes are needed, but I’m not sure how I can make these changes in TAO. Reference How to handle binary segmentation task in the following repo:

github.com/open-mmlab/mmsegmentation

docs/en/faq.md

master

# Frequently Asked Questions (FAQ)

We list some common troubles faced by many users and their corresponding solutions here. Feel free to enrich the list if you find any frequent issues and have ways to help others to solve them. If the contents here do not cover your issue, please create an issue using the [provided templates](https://github.com/open-mmlab/mmsegmentation/blob/master/.github/ISSUE_TEMPLATE/error-report.md/) and make sure you fill in all required information in the template.

## Installation

The compatible MMSegmentation and MMCV versions are as below. Please install the correct version of MMCV to avoid installation issues.

| MMSegmentation version |        MMCV version         | MMClassification version |
| :--------------------: | :-------------------------: | :----------------------: |
|         master         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.30.0         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.29.1         |  mmcv-full>=1.5.0, \<1.8.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.29.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.28.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.27.0         |  mmcv-full>=1.5.0, \<1.7.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.26.0         | mmcv-full>=1.5.0, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.25.0         | mmcv-full>=1.5.0, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.24.1         | mmcv-full>=1.4.4, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |
|         0.23.0         | mmcv-full>=1.4.4, \<=1.6.0  | mmcls>=0.20.1, \<=1.0.0  |

This file has been truncated. show original

Morganh · July 17, 2025, 4:52pm

This reduce_zero_label controls whether annotation labels are decremented by 1 at data load time. For binary segmentation datasets with only two classes (background = 0 and foreground = 1), this parameter should be set to False . Setting it to True causes all labels to shift down by one, making background labels invalid and giving the model no proper foreground class to learn. So, please set to False.

More, it is not expected t0 set num_classes: 1. For binary segmentation, it is a must to set: num_classes=2. Please retry with this setting.

kianmehr.ehtiatkar2 · July 17, 2025, 5:15pm

Hello Morgan, I’ve already tried the combination with num_classes=2 and reduce_zero_label=False with no luck. Moreover, reduce_zero_label should be a passable parameter if it needs to be set accordingly for a binary task, or this should be done dynamically within the dataloader pipeline.

Topic		Replies	Views
TAO Mask2former Binary Instance Segmentation Training TAO Toolkit segmentation , tao	2	39	July 17, 2025
Fine-Tune the TAO v5.5.0 Mask2former Instance segmentation model on a custom dataset TAO Toolkit	7	235	October 15, 2024
TAO mask2former semantic segmentation sample .yaml TAO Toolkit	2	34	September 17, 2025
Predict image with TAO v5.5.0 Toolkit trained mask2former instance segmentation model (.pth) TAO Toolkit	11	111	September 6, 2024
Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 ! TAO Toolkit	28	1088	February 27, 2023
Multiple classes not detected? TAO Toolkit	19	1019	October 12, 2021
Problem in training unet TAO Toolkit	22	1925	October 12, 2021
TAO MaskRCNN inference output problem TAO Toolkit	36	1091	November 30, 2023
TAO-5 Mask-rcnn converting tlt to uff instead of onnx TAO Toolkit cudnn , inception	5	43	January 13, 2025
Poor performance of MaskRCNN on images TAO Toolkit	16	1377	October 12, 2021

Accuracy and mIoU of 1.0 when validating Mask2Former

Related topics