SegFormer fine-tuning

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc): NVIDIA RTX PRO 4000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): SegFormer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I want to fine-tuning the Segformer network in my own data (two classes: tumor segmentation and background). How can I get the pre-trained model and its spec?

Note: In the tao_tutorials/notebooks/tao_launcher_starter_kit/segformer/segformer.ipynb at main · NVIDIA/tao_tutorials · GitHub

there is no instruction for download the pre-trained model.

I really appreciate your help.

Hi @eduardo.assuncao1 ,
Segformer can support some kinds of backbones according to SegFormer — Tao Toolkit.

Please refer to
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_segformer_imagenet/ .
For example, pretrained model for fan_base is in https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_segformer_imagenet/files?version=fan_hybrid_base_in22k_1k_384.

Similar topic is: Training SegFormer with Nv-DinoV2 backbone on Segmentation Task - #2 by Morganh.

BTW, for nv_dino_v2 models,
For example,
https://catalog.ngc.nvidia.com/orgs/nvaie/models/nv_dinov2_classification_model/files
https://catalog.ngc.nvidia.com/orgs/nvaie/models/imagenet_nv_dinov2/files

@Morganh, thank you for your reply.

I have managed to train the Segformer model. However, the performance is too low for the foreground:

!tao model segformer evaluate
-e $SPECS_DIR/test_isbi.yaml
evaluate.checkpoint=$RESULTS_DIR/isbi_experiment/train/segformer_model_latest.pth
results_dir=$RESULTS_DIR/isbi_experiment

Testing DataLoader 0: 100%|██████████| 67/67 [00:04<00:00, 15.59it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric ┃ DataLoader 0 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ F1_0 │ 0.9995958805084229 │
│ F1_1 │ 0.532546877861023 │
│ acc │ 0.9991925358772278 │
│ iou_0 │ 0.9991921782493591 │
│ iou_1 │ 0.3629055917263031 │
│ mf1 │ 0.7660713791847229 │
│ miou │ 0.6810488700866699 │
│ mprecision │ 0.9157484769821167 │
│ mrecall │ 0.695730984210968 │
│ precision_0 │ 0.9992848634719849 │
│ precision_1 │ 0.8322120308876038 │
│ recall_0 │ 0.9999071359634399 │
│ recall_1 │ 0.3915548324584961 │
└───────────────────────────┴───────────────────────────┘

Here are the training and evaluation specs:
train_lidc.txt (1.2 KB)

test_isbi.txt (1.1 KB)

Here are a sample of my data:

Can you give me any tip to improve the performance?

You can set a larger input size(change 224 to 512) and use a larger backbone(e.g., fan_base).
Below is an example I run with an older docker nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt . You can try as well.

$ docker run --runtime=nvidia -it --rm docker nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash

$ segformer train -e /localhome/local-morganh/segformer/fanbase.yaml

$ cat fanbase.yaml

results_dir: /localhome/local-morganh/segformer/fanbase 
train: 
  num_gpus: 1 
  exp_config: 
      manual_seed: 49 
  checkpoint_interval: 200 
  logging_interval: 10 
  max_iters: 20000 #5000 #10000 #5000 
  resume_training_checkpoint_path: null 
  validate: True 
  validation_interval: 10 #200 #50 
  trainer: 
      find_unused_parameters: True 
      sf_optim: 
        lr: 0.00006 
evaluate: 
  checkpoint: /localhome/local-morganh/segformer/fanbase/train/iter_20000.pth
model: 
  input_height: 512
  input_width: 512
  pretrained_model_path: /localhome/local-morganh/segformer/fan_hybrid_base_in22k_1k_384.pth  #https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_segformer_imagenet/files?version=fan_hybrid_base_in22k_1k_384 
  #pretrained_model_path: null 
  backbone: 
    type: "fan_base_16_p4_hybrid" 
dataset: 
  input_type: "grayscale" 
  img_norm_cfg: 
        mean: 
          - 127.5 
          - 127.5 
          - 127.5 
        std: 
          - 127.5 
          - 127.5 
          - 127.5 
        to_rgb: True 
  data_root: /tao-pt/tao-experiments 
  train_dataset: 
      img_dir: 
        - /localhome/local-morganh/segformer/data/image/train
     ann_dir: 
        - /localhome/local-morganh/segformer/data/mask/train
      pipeline: 
        augmentation_config: 
          random_crop: 
            #crop_size: 
            #  - 672 
            #  - 672 
            cat_max_ratio: 0.75 
          resize: 
            img_scale: 
              - 512 
              - 1024 
            ratio_range: 
              - 0.5 
              - 2.0 
          random_flip: 
            prob: 0.5
  val_dataset: 
      img_dir: 
        - /localhome/local-morganh/segformer/data/image/val
     ann_dir: 
        - /localhome/local-morganh/segformer/data/mask/val
  val_dataset: 
      img_dir: 
        - /localhome/local-morganh/segformer/data/image/test
     ann_dir: 
        - /localhome/local-morganh/segformer/data/mask/test
  palette: 
    - seg_class: background 
      rgb: 
        - 0 
        - 0 
        - 0 
      label_id: 0 
      mapping_class: background 
    - seg_class: foreground 
      rgb: 
        - 255 
        - 255 
        - 255 
      label_id: 1 
      mapping_class: foreground 
  repeat_data_times: 500 
  batch_size: 8 #4 #1 
  workers_per_gpu: 1 
export: 
  input_height: 512 
  input_width: 512 
  input_channel: 3 
  onnx_file: "${results_dir}/iter_500.onnx"
gen_trt_engine: 
  input_width: 512 
  input_height: 512 
  tensorrt: 
    data_type: FP32 
    workspace_size: 1024 
    min_batch_size: 1 
    opt_batch_size: 1 
    max_batch_size: 1 

Run evaluation
$ segformer evaluate -e /localhome/local-morganh/segformer/fanbase.yaml evaluate.checkpoint=/localhome/local-morganh/segformer/fanbase/train/iter_20000.pth

Run inference
$ segformer inference -e /localhome/local-morganh/segformer/fanbase.yaml inference.checkpoint=/localhome/local-morganh/segformer/fanbase/train/iter_20000.pth

I have a question regarding the fanbase.yaml spec? Do I need to do any modification to adapt to my custom data? For example, my image size is 512x512 with just one channel.
In your spec (fanbase.yaml) I see some thing like:
img_scale:

  • 512
  • 1024

img_norm_cfg:
mean:

  • 127.5
  • 127.5
  • 127.5
    palette:
  • seg_class: background
    rgb:
  • 0
  • 0
  • 0
    export:
    input_height: 512
    input_width: 512
    input_channel: 3

gen_trt_engine:
input_width: 512
input_height: 512
input_width: 672

My question, regarding the above configuration, is related to image size and channels. Do I need to make any change?

My previous experiment is also running against 512x512 single channel. So, you can take it as reference.

I could’t run your previous experiment do to the GPU compatibility:

docker run --gpus all -it --rm
-u $(id -u):$(id -g)
-v /home/cvig/CVIG/Devel/tao_experiments_segformer_ccg:/workspace/tao_experiments
nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash

===========================

=== TAO Toolkit PyTorch ===

NVIDIA Release 5.5.0-PyT (build 88113656)
TAO Toolkit Version 5.5.0

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:

WARNING: Detected NVIDIA RTX PRO 4000 Blackwell Generation Laptop GPU GPU, which is not yet supported in this version of the container
ERROR: No supported GPU(s) detected to run this container

However, I managed to have better result using larger model and image resolution (512x512) :

model_epoch_109_step_38060.pth
Testing DataLoader 0: 100%|██████████| 67/67 [00:20<00:00, 3.28it/s]
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric ┃ DataLoader 0 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ F1_0 │ 0.9996967315673828 │
│ F1_1 │ 0.7115785479545593 │
│ acc │ 0.9993942379951477 │
│ iou_0 │ 0.9993938207626343 │
│ iou_1 │ 0.5522871613502502 │
│ mf1 │ 0.8556376695632935 │
│ miou │ 0.7758404612541199 │
│ mprecision │ 0.9048025012016296 │
│ mrecall │ 0.8171431422233582 │
│ precision_0 │ 0.9995691180229187 │
│ precision_1 │ 0.8100359439849854 │
│ recall_0 │ 0.999824583530426 │
│ recall_1 │ 0.6344617605209351 │
└───────────────────────────┴───────────────────────────┘

Here is the spec: train_lidc.txt (1.2 KB)

@Morganh, do you know if there is any parameter so that we can mitigate the problem of the imbalanced classes (foreground and background)?

Can we apply zoom (augmentation) to improve detection of small object?

OK, the NVIDIA RTX PRO 4000 (blackwell) is compatible with TAO6.x docker instead of TAO5.5 docker.

Glad to know the result is better now. So, you are still running with TAO6.0 docker instead of TAO5.5 docker, right? Just to confirm since your latest train yaml is TAO6’s version.

Yes, I still running with TAO6.0 docker because of the ERROR when I try to run with TAO5.5 docker: No supported GPU(s) detected to run this container.