Accuracy and mIoU of 1.0 when validating Mask2Former

Morganh · July 18, 2025, 6:21am

Could you please share the training spec file when you get above intermediate result? It looks promising. You can try to train with more epochs/iterations. Also, more experiments like changing the backbone can also be tried.

kianmehr.ehtiatkar2 · July 18, 2025, 6:36pm

Morgan, I’ve been able to train the same model using the same dataset with correct results on another platform using the same hyperparamters. Due to various reasons, I want to train this model on TAO to be able to export it in Tensorrt or Onnx format to inference on Triton.

There is a fundamental problem here where the model is being trained on the background and predicting the background, and not the main object as you can see in the predicted mask. Also, the results look more like semantic segmentation rather than instance segmentation. Despite this, when I convert the pytorch model to tensorrt and inference using the converted model, I do get masks, just not with good performance. These results are with 200 pochs and fine-tuned hyperparameters using ClearML HPO with the Optuna search method.

I expect accurate results like this:

It’s important to note that, still, my mIoU (0.47) and accuracy (0.94) stay flat throughout the training process while train/val losses go down.

Here is my training spec for your reference as requested.
inst_spec.txt (2.5 KB)

Morganh · July 21, 2025, 3:07am

May I know if you get similar issue when run the mask2former_inst.ipynb (tao_tutorials/notebooks/tao_launcher_starter_kit/mask2former/mask2former_inst.ipynb at main · NVIDIA/tao_tutorials · GitHub)?
I will try to reproduce firstly.
Current the culprit maybe:

The dataset is enough? Your training images are 328.
Can we set a larger backbone, for example, swin-L?
Categroy_id starts from 0 or 1?
Is it related to the images’ resolution because yours are 2450x500? How about cropping to square images?

More, can you run evaluation against the training dataset as well?

kianmehr.ehtiatkar2 · July 24, 2025, 9:18pm

Hello Morgan,

To answer your questions 1 and 2, I’ve already trained this model with successful results on another platform (not TAO) using the same model architecture and the same dataset, so it can’t be those.
My category ID starts from 1 as noted in the documentation.
I’ve tried cropping the images square, and it makes no difference in the overall performance (although it changes it a little bit).
I’ve run the evaluation against training with similar results (aka no prediction mask in the model)

Morganh · August 31, 2025, 2:26am

As we synced offline, please run 500 epochs against your 2450x500 dataset. It will cost about 3.4 hours. During training, please ignore the info “mIoU=1.000, all_acc=1.000”
I can get promising result on my side.
I did not change the format of the dataset. I trained by using the folder Mask2former_data_COCO you shared.

After training, need to run inference with the change mentioned in https://forums.developer.nvidia.com/t/tao-model-mask2former-inference-does-not-produce-overlay-images-or-masks-annotations/338399/9?u=morganh since your dataset has .png format.

Below is the spec file.

$ cat 20250820_mask2former.yaml
results_dir: ./mask2former_inst/
dataset:
  contiguous_id: false #True
  label_map: /localhome/local-morganh/Mask2former_data_COCO/annotations/label_inst.json
  train:
    type: 'coco'
    name: "my_train"
    instance_json: "/localhome/local-morganh/Mask2former_data_COCO/annotations/train.json"
    img_dir: "/localhome/local-morganh/Mask2former_data_COCO/train"
    batch_size: 2 #16
    num_workers: 2
    target_size: [2450, 500]
    #target_size: [672, 672]
  val:
    type: 'coco'
    name: "my_val"
    instance_json: "/localhome/local-morganh/Mask2former_data_COCO/annotations/val.json"
    img_dir: "/localhome/local-morganh/Mask2former_data_COCO/val"
    batch_size: 1
    num_workers: 2
    target_size: [2450, 500]
    #target_size: [672, 672]
  test:
    img_dir: "/localhome/local-morganh/Mask2former_data_COCO/test"
    batch_size: 1
    num_workers: 2
    type: 'coco'
  augmentation:
    train_min_size: [500] #[640]
    train_max_size: 2450
    #train_crop_size: [512, 512] #[640, 640]
    train_crop_size: [500, 2450] #[640, 640]
    #train_crop_size: [672, 672] #[640, 640]
    test_min_size: 500  #640
    test_max_size: 2450
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std:  [0.229, 0.224, 0.225]
train:
  #precision: 'fp16'
  precision: 'fp32'
  num_gpus: 1
  checkpoint_interval: 1
  validation_interval: 1
  num_epochs: 500 #200 #50
  clip_grad_norm: 0.4
  optim:
    lr_scheduler: "MultiStep"
    #milestones: [44, 48]
    #milestones: [120, 150]
    milestones: [350, 400]
    type: "AdamW"
    lr: 0.0003
    weight_decay: 0.06
    gamma: 0.1
evaluate:
  checkpoint: ./mask2former_inst/train/mask2former_model_latest.pth
  num_gpus: 1
  results_dir: ./mask2former_inst/evaluate
inference:
  checkpoint: ./mask2former_inst/train/mask2former_model_latest.pth
  num_gpus: 1
  gpu_ids: [0]
  results_dir: ./mask2former_inst/inference_test
model:
  object_mask_threshold: 0.01 #0.1
  overlap_threshold: 0.01 #0.8
  mode: "instance"
  backbone:
    #pretrained_weights: null
    pretrained_weights: "/localhome/local-morganh/swin_tiny_patch4_window7_224_22k.pth"
    type: "swin"
    swin:
      type: "tiny"
      window_size: 7
      ape: False
      pretrain_img_size: 224
  mask_former:
    num_object_queries: 100
  sem_seg_head:
    norm: "GN"
    num_classes: 2 #2
export:
  input_channel: 3
  input_width: 640
  input_height: 640
  opset_version: 17
  batch_size: -1  # dynamic batch size
  on_cpu: False
gen_trt_engine:
  gpu_id: 0
  tensorrt:
    data_type: fp16
    workspace_size: 4096
    min_batch_size: 1
    opt_batch_size: 1
    max_batch_size: 1

Topic		Replies	Views
TAO Mask2former Binary Instance Segmentation Training TAO Toolkit segmentation , tao	2	39	July 17, 2025
Fine-Tune the TAO v5.5.0 Mask2former Instance segmentation model on a custom dataset TAO Toolkit	7	235	October 15, 2024
TAO mask2former semantic segmentation sample .yaml TAO Toolkit	2	34	September 17, 2025
Predict image with TAO v5.5.0 Toolkit trained mask2former instance segmentation model (.pth) TAO Toolkit	11	111	September 6, 2024
Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 ! TAO Toolkit	28	1088	February 27, 2023
Multiple classes not detected? TAO Toolkit	19	1019	October 12, 2021
Problem in training unet TAO Toolkit	22	1925	October 12, 2021
TAO MaskRCNN inference output problem TAO Toolkit	36	1091	November 30, 2023
TAO-5 Mask-rcnn converting tlt to uff instead of onnx TAO Toolkit cudnn , inception	5	43	January 13, 2025
Poor performance of MaskRCNN on images TAO Toolkit	16	1377	October 12, 2021

Accuracy and mIoU of 1.0 when validating Mask2Former

Related topics