TAO5 unet vs segformer

In this NVIDIA article Improve Accuracy and Robustness of Vision AI Apps with Vision Transformers and NVIDIA TAO the implied message is to use segformers for Accuracy and Robustness.

Yet I trained both TAO5 unet and segformers examples out-of-the box to very different results:

unet inference:

{‘foreground’: {‘precision’: 0.7040575, ‘Recall’: 0.74182844, ‘F1 Score’: 0.7224496623348184, ‘iou’: 0.565496}, ‘background’: {‘precision’: 0.9340356, ‘Recall’: 0.9214057, ‘F1 Score’: 0.9276776205874399, ‘iou’: 0.8651108}}

segformer inference

| Class      | IoU   | Acc   |
| foreground | 36.33 | 41.23 |
| background | 84.15 | 96.6  |

It is obvious that segformer is not performing better by any means

Is it worth it for me to migrate my unet models to segformer? how?

Continuing the discussion from Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 !:

In my previous to last post I explained the very bad results I was getting with my custom dataset and segformer on TAO4 which led to abandon the idea of using segformers with my data because it would not converge despite running for several days…

@Morganh answer then was add more data, but these custom images are super expensive, and I don’t want to spend a lot of money generating more custom images just to find out that I am in the same place.

The Question: Under what dataset or training conditions can I expect better results from segformer?


When you run above experiments for both Unet and segformer, you are running default notebook and default spec file, correct?

@Morganh That’s exactly what I said, out of the box, meaning no changes made…Not in specs or dataset

Thanks for the info. I will check further.

@Morganh Thanks

For segformer in TAO5.0, there are more backbones. SegFormer - NVIDIA Docs
You can change to deeper backbone instead. For example, mit_b5.

Also, in your current experiment for segformer, there is not pretrained model.
You can download mit_b5 version of pretrained model from ngc, and set it in training spec file.
The pretrained model can be found in


I did that and results in an cryptic error that I fail to figure out. Please see the error log attached.

segformer errors 2023 08 03.txt (7.0 KB)

Please confirm: I think you are saying to download the CitySemSegformer trainable_mit-b5_v1.0 from ngc and change the spec file

  input_height: 512
  input_width: 512
  pretrained_model_path: null
    type: "mit_b1"


  input_height: 512
  input_width: 512
  	- /pretrained/citysemsegformer_mit.pth
    type: "mit_b5"
yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
  in "/specs/train_isbi.yaml", line 18, column 3

Please double check your train_isbi.yaml.

@Morganh Thanks for the suggestion here!

Using the mit5 pretrained weights made a great difference:

After the default training the evaluation of the resulting model is:

| Class      | IoU   | Acc   |
| foreground | 64.04 | 75.23 |
| background | 89.99 | 95.6  |

| Scope  | mIoU  | mAcc  | aAcc |
| global | 77.01 | 85.42 | 91.5 |

An the resulting inference images:

Many thanks!

I’d like to test also the fan backbone.

I found the trainable_fan_v1.0 pretrained weights file here.

For which of the fan backbones is this applicable?


@Morganh thanks!

It is fan_base_16_p4_hybrid.

@Morganh Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.