TAO 4 Segformer Input and output dimensions and tensors

david9xqqb · February 14, 2023, 12:27pm

Looking to clarify the images size options and how to specify, and the tensor shapes for the input and output tensors of the model, once exported to tensorRT to feed images to the model, and receive inference masks.

Also interested in knowing if they are NHWC or NCHW.

Many thanks!

Dave

Morganh · February 15, 2023, 4:58am

Refer to Deploying to Deepstream — TAO Toolkit 4.0 documentation

You can also use “polygraphy inspect model xxx.engine” to check the input/output tensor.

david9xqqb · February 15, 2023, 7:38am

@Morganh Thanks!

That was very important also because of the statement:

> Segformer models require the TensorRT OSS build because several prerequisite TensorRT plugins are only available in the TensorRT open source repo.

Which will save me aggravation down the line…

What I am missing, is for the segform training specs, where it’s not clear how to set the size of the images in a multiclass rgb image of size 1280 x 720.

Thanks Again,

David

Morganh · February 17, 2023, 2:29am

That means, it is needed to use TRT OSS repo to build a new /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so and replace.

Change the size_ht and size_wd under ‘Pad’ and the two values under crop_size. The first param under crop_size Is height. So for e.g., if you want to train on 720x1280 (hxw) resolution. This is how config will look like:

train_pipeline:
    augmentation_config:
      random_crop:
        crop_size:
          - 720
          - 1280
        cat_max_ratio: 0.75
      resize:
        img_scale:
          - 2048 (or) 1024
          - 1280
        ratio_range:
          - 0.5
          - 2.0
      random_flip:
        prob: 0.5
    Pad:
      size_ht: 720
      size_wd: 1280
      pad_val: 0
      seg_pad_val: 255

For img_scale, please set second param to the shortest input resolution. For the first param, you can set it to any value greater than your shortest input resolution upto 2048 . But it is advisable to use equal height and width to avoid hassle of setting values at multiple places.
More, the img_scale should be >= crop_size .
We will improve document in next release.

david9xqqb · February 17, 2023, 8:03am

Thanks @Morganh. I did not understand what you mean by that. All my images are 1280X720…

Which are the height and width parameters for what? This is very unclear…

David

david9xqqb · February 17, 2023, 1:45pm

@Morganh: I modified the spec file and did not give good results.

The modified spec file is here…

train_isbi.yaml (1.4 KB)

The bad results I reported in a separate post since it appears to me as a separate issue…

https://forums.developer.nvidia.com/t/migrating-tao3-unet-model-to-segformer-foreground-has-performance-of-0-0/243149/3

Thanks!

Morganh · February 19, 2023, 5:08pm

For the “img_scale”.
You can refer to above setting for your 1280x720 images.

david9xqqb · February 19, 2023, 8:21pm

@Morganh . I apologize, but I still don’t understand, and I am confused.

I want to create a segformer model with grayscale images of 1280X720, and background/foreground classes… What are the setting I need to define?

And for a second model of RGB images of the same 1280X720, multiclass with 6 classes?

Many thanks,

David

Morganh · February 20, 2023, 2:36pm

Refer to SegFormer - NVIDIA Docs and SegFormer - NVIDIA Docs

See Data Annotation Format - NVIDIA Docs
For the color/ rgb input images, each mask image is a single-channel or three-channel image with size equal to the input image. Every pixel in the mask should have an integer value that represents the segmentation class label_id

david9xqqb · February 20, 2023, 10:11pm

After reading the documentation several times, I still don’t know how to specify in the yaml spec file that the images are 1280X720 for training, evaluation (testing) and inference…

There are terms such as pad, scale and multi_scale, that I don’t understand their meaning…

@Morganh Thanks!

Morganh · February 21, 2023, 3:41am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For below part,

      resize:
        img_scale:
          - 1024
          - 512
        ratio_range:
          - 0.5
          - 2.0

A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.
The img_scale contains the images scale base to multiply with ratio.
The ratio_range contains The minimum and maximum ratio to scale the img_scale.
For above example, the minimum ratio is 0.5, the maximum ratio is 2.0.
Then, the new height is are randomly set to the range from 1024x0.5 to 1024x2.0. The new width is randomly set to the range from 512x0.5 to 512x2.0. So, the augmentation images’ resolution is (new_height, new_width) .

The validation config contains “multi_scale” for validation during training. The multi_scale is the largest scale of image.

Pad: It is the padding augmentation. size_ht (int): The height at which to pad the image/mask.
size_wd (int): The width at which pad the image/mask
pad_val (int): The padding value for the input image
seg_pad_val (int): The padding value for the segmentation

Example of RGB and multiclasses.

dataset_config:
  data_root: ???
  val_img_dir: ???
  val_ann_dir: ???
  train_img_dirs:  ???
  train_ann_dirs: ???
  img_norm_cfg:
      mean:
        - 123.675
        - 116.28
        - 103.53
      std:
        - 58.395
        - 57.12
        - 57.375
      to_rgb: true
  palette:
    - seg_class: car
      rgb:
        - 255
        - 0
        - 0
      label_id: 1
      mapping_class: car
    - seg_class: background
      rgb:
        - 10
        - 10
        - 10
      label_id: 0
      mapping_class: background
    - seg_class: road
      rgb:
        - 128
        - 64
        - 128
      label_id: 7
      mapping_class: road
    - seg_class: sidewalk
      rgb:
        - 244
        - 35
        - 232
      label_id: 8
      mapping_class: sidewalk
    - seg_class: building
      rgb:
        - 70
        - 70
        - 70
      label_id: 11
      mapping_class: building
    - seg_class: wall
      rgb:
        - 102
        - 102
        - 102
      label_id: 12
      mapping_class: wall
  train_pipeline:
    augmentation_config:
      random_crop:
        crop_size:
          - 512
          - 512
        cat_max_ratio: 0.75
      resize:
        img_scale:
          - 1024
          - 512
        ratio_range:
          - 0.5
          - 2.0
      random_flip:
        prob: 0.5
    Pad:
      size_ht: 512
      size_wd: 512
      pad_val: 0
      seg_pad_val: 255
  repeat_data_times: 500
  batch_size_per_gpu: 2
  workers_per_gpu: 2

system · March 20, 2023, 7:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SegFormer error with segmentation map TAO Toolkit	4	33	February 25, 2025
Exporting model to onnx using "tao model segformer export" TAO Toolkit	5	497	September 6, 2023
Pre-trained Segformer - CityScapes - Input dims appear to be 224x224 TAO Toolkit	17	690	March 8, 2024
Cannot use TensorRT model exported by NVIDIA TAO TAO Toolkit	8	1148	May 17, 2022
Questions regarding the preparation of images for training yolo_v4 model on TAO toolkit TAO Toolkit	5	576	January 17, 2024
TAO Preprocessing steps for yolo_v4 model and grayscale dataset TensorRT tensorrt , cudnn	2	95	July 11, 2024
TAO5 unet vs segformer TAO Toolkit	12	952	August 19, 2023
Migrating TAO3 unet model to segformer, Foreground has performance of 0.0 ! TAO Toolkit	28	1001	February 27, 2023
Segformer Polygon output TAO Toolkit	6	533	July 9, 2023
Segformer Batch Size vs Memory Consumption vs Execution time TAO Toolkit	5	33	August 16, 2024

TAO 4 Segformer Input and output dimensions and tensors

Related topics