Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc): NVIDIA RTX PRO 4000
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc): SegFormer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I want to fine-tuning the Segformer network in my own data (two classes: tumor segmentation and background). How can I get the pre-trained model and its spec?
Please refer to https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_segformer_imagenet/ .
For example, pretrained model for fan_base is in https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pretrained_segformer_imagenet/files?version=fan_hybrid_base_in22k_1k_384.
BTW, for nv_dino_v2 models,
For example, https://catalog.ngc.nvidia.com/orgs/nvaie/models/nv_dinov2_classification_model/files https://catalog.ngc.nvidia.com/orgs/nvaie/models/imagenet_nv_dinov2/files
You can set a larger input size(change 224 to 512) and use a larger backbone(e.g., fan_base).
Below is an example I run with an older docker nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt . You can try as well.
I have a question regarding the fanbase.yaml spec? Do I need to do any modification to adapt to my custom data? For example, my image size is 512x512 with just one channel.
In your spec (fanbase.yaml) I see some thing like:
img_scale:
I could’t run your previous experiment do to the GPU compatibility:
docker run --gpus all -it --rm
-u $(id -u):$(id -g)
-v /home/cvig/CVIG/Devel/tao_experiments_segformer_ccg:/workspace/tao_experiments nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash
===========================
=== TAO Toolkit PyTorch ===
NVIDIA Release 5.5.0-PyT (build 88113656)
TAO Toolkit Version 5.5.0
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the TAO Toolkit End User License Agreement.
By pulling and using the container, you accept the terms and conditions of this license:
WARNING: Detected NVIDIA RTX PRO 4000 Blackwell Generation Laptop GPU GPU, which is not yet supported in this version of the container
ERROR: No supported GPU(s) detected to run this container
However, I managed to have better result using larger model and image resolution (512x512) :
OK, the NVIDIA RTX PRO 4000 (blackwell) is compatible with TAO6.x docker instead of TAO5.5 docker.
Glad to know the result is better now. So, you are still running with TAO6.0 docker instead of TAO5.5 docker, right? Just to confirm since your latest train yaml is TAO6’s version.
You can try to train from scratch firstly.
Then try to use the pretrained models.
For c_raidio, the models are in https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/cradiov2/files
For nvdino_v2, the models are in https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/imagenet_nv_dinov2?version=trainable_v1.1.
I have trained the c_radio_v2 and nvdino_v2 from scratch but the performance is bad. Trying to fine-tuning the nvdino_v2 the following ERROR occurred:
Do ViT pretrained backbone interpolation
Error executing job with overrides: [‘results_dir=/results/isbi_experiment’, ‘train.num_gpus=1’]Traceback (most recent call last):
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py”, line 72, in _func
raise e
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py”, line 51, in _func
runner(cfg, **kwargs)
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 94, in main
run_experiment(
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/scripts/train.py”, line 60, in run_experiment
model = SegFormerPlModel(experiment_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/segformer_pl_model.py”, line 56, in init
self._build_model(export)
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/segformer_pl_model.py”, line 94, in _build_model
self.model = build_model(experiment_config=self.experiment_spec, export=export)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/segformer.py”, line 221, in build_model
model = SegFormer(
^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/segformer.py”, line 124, in init
self.backbone = vit_adapter_model_dict[self.model_name]( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/backbones/nvdinov2.py”, line 85, in init
pretrained_backbone_ckp = interpolate_vit_checkpoint(checkpoint=pretrained_backbone_ckp,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/cv/segformer/model/backbones/nvdinov2.py”, line 185, in interpolate_vit_checkpoint
checkpoint = interpolate_patch_embed(checkpoint=checkpoint, new_patch_size=target_patch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/nvidia_tao_pytorch/core/utils/pos_embed_interpolation.py”, line 87, in interpolate_patch_embed
patch_embed = checkpoint[‘patch_embed.proj.weight’] ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ‘patch_embed.proj.weight’
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
2026-01-19 12:10:35,464 [TAO Toolkit] [WARNING] root 339: Telemetry data couldn’t be sent, but the command ran successfully.
2026-01-19 12:10:35,464 [TAO Toolkit] [WARNING] root 342: [Error]: ‘str’ object has no attribute ‘decode’
2026-01-19 12:10:35,464 [TAO Toolkit] [WARNING] root 346: Execution status: FAIL
2026-01-19 12:10:36,142 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 371: Stopping container.
May I know how many dataset did you train? From the result as of now, it is better than the result training from scratch.
Could you share the full log when run training? Please check the status of loss change.
Yes, train by fine-tuning is better. I think if could train with image resolution of 512x512 the result would be even better. But if try to train with this image resolution, it raiser an ERROR.