SegFormer error with segmentation map

Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc): T4
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): SegFormer
• TLT Version: docker tag 5.5.0-pyt
• Training spec file(If have, please share here): attached at end
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Command line: tao model segformer train
-e $SPECS_DIR/train_leo.yaml
results_dir=$RESULTS_DIR/leo

I am trying to use the TAO SegFormer 5.5.0 model to do semantic segmentation on 1032x772 color images. Note that I have grayscale ground truth files but am getting the following warning:

/usr/local/lib/python3.10/dist-packages/mmseg/datasets/transforms/formatting.py:81: UserWarning: Please pay attention your ground truth segmentation map, usually the segmentation map is 2D, but got (772, 1032, 3)

And later I get the following error:

File “/usr/local/lib/python3.10/dist-packages/mmseg/evaluation/metrics/iou_metric.py”, line 186, in intersect_and_union
pred_label = pred_label[mask]IndexError: too many indices for tensor of dimension 2

This is confusing because even though I have grayscale ground truth it seems like it’s treating them as color.

Any help would be appreciated.

Duane Harkness
Rendered.ai

Additional information - Here is my specs file

train:
  exp_config:
      manual_seed: 49
  checkpoint_interval: 200
  logging_interval: 50
  max_iters: 1000
  resume_training_checkpoint_path: null
  trainer:
      find_unused_parameters: True
      sf_optim:
        lr: 0.00006
model:
  input_height: 772
  input_width: 1032
  pretrained_model_path: null
  backbone:
    type: "mit_b1"
dataset:
  input_type: "rgb"
  img_norm_cfg:
        mean:
          - 127.5
          - 127.5
          - 127.5
        std:
          - 127.5
          - 127.5
          - 127.5
  data_root: /data
  train_dataset:
      img_dir:
        - /data/images/train
      ann_dir:
        - /data/masks/train
      pipeline:
        augmentation_config:
          random_crop:
            cat_max_ratio: 0.75
          resize:
            ratio_range:
              - 0.5
              - 2.0
          random_flip:
            prob: 0.5
  palette:
    - seg_class: background
      rgb:
        - 0
        - 0
        - 0
      label_id: 0
      mapping_class: background
    - seg_class: cable
      rgb:
        - 255
        - 255
        - 255
      label_id: 1
      mapping_class: cable
  repeat_data_times: 500
  batch_size: 4
  workers_per_gpu: 1
1 Like

@cshah @mkyu for visibility.

Could you please make sure it is truly grayscale (single-channel) image?

  1. Use Pillow (PIL) to verify and convert:
from PIL import Image

mask = Image.open('mask.png')
if mask.mode != 'L':
    mask = mask.convert('L')
mask.save('grayscale_mask.png')

This code opens the image, checks if it’s already in grayscale mode (‘L’), and converts it if necessary.

  1. Utilize OpenCV:
import cv2

mask = cv2.imread('mask.png', cv2.IMREAD_GRAYSCALE)
cv2.imwrite('grayscale_mask.png', mask)

This approach reads the image directly as grayscale and saves it, ensuring a single-channel output.

  1. Verify with NumPy:
  2. After loading the image, check its shape:
import numpy as np
import cv2

mask = cv2.imread('mask.png', cv2.IMREAD_GRAYSCALE)
if len(mask.shape) != 2:
    raise ValueError("Mask is not single-channel")

A true grayscale image should have only two dimensions.

More, please make sure you have followed Data Annotation Format - NVIDIA Docs. Especially ’ Every pixel in the mask must have an integer value that represents the segmentation class label_id".

I checked my masks using your suggestion and they were grayscale.

The main difference between the ISBI example and what I’m doing is my images are rgb instead of grayscale. To see if that was the issue I converted my images to grayscale and the error went away. Are rgb images supposed to work? I’d rather use rgb images than grayscale.

Yes, it is supported. For example, for cityscapes dataset, the spec file is:

Pre-trained Segformer - CityScapes - Input dims appear to be 224x224 - #12 by Morganh and Question of Pretrained Segformer in NGC - #4 by Morganh.

As mentioned above, please take attention to Data Annotation Format - NVIDIA Docs. Especially, every pixel in the mask must have an integer value that represents the segmentation class label_id ".