In this NVIDIA article Improve Accuracy and Robustness of Vision AI Apps with Vision Transformers and NVIDIA TAO the implied message is to use segformers for Accuracy and Robustness.
Yet I trained both TAO5 unet and segformers examples out-of-the box to very different results:
In my previous to last post I explained the very bad results I was getting with my custom dataset and segformer on TAO4 which led to abandon the idea of using segformers with my data because it would not converge despite running for several days…
@Morganh answer then was add more data, but these custom images are super expensive, and I don’t want to spend a lot of money generating more custom images just to find out that I am in the same place.
The Question: Under what dataset or training conditions can I expect better results from segformer?
Also, in your current experiment for segformer, there is not pretrained model.
You can download mit_b5 version of pretrained model from ngc, and set it in training spec file.
The pretrained model can be found in
yaml.scanner.ScannerError: while scanning for the next token
found character '\t' that cannot start any token
in "/specs/train_isbi.yaml", line 18, column 3