TAO 4 Segformer Input and output dimensions and tensors

Looking to clarify the images size options and how to specify, and the tensor shapes for the input and output tensors of the model, once exported to tensorRT to feed images to the model, and receive inference masks.

Also interested in knowing if they are NHWC or NCHW.

Many thanks!

Dave

Refer to Deploying to Deepstream — TAO Toolkit 4.0 documentation

You can also use “polygraphy inspect model xxx.engine” to check the input/output tensor.

@Morganh Thanks!

That was very important also because of the statement:

> Segformer models require the TensorRT OSS build because several prerequisite TensorRT plugins are only available in the TensorRT open source repo.

Which will save me aggravation down the line…

What I am missing, is for the segform training specs, where it’s not clear how to set the size of the images in a multiclass rgb image of size 1280 x 720.

Thanks Again,

David

That means, it is needed to use TRT OSS repo to build a new /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so and replace.

Change the size_ht and size_wd under ‘Pad’ and the two values under crop_size. The first param under crop_size Is height. So for e.g., if you want to train on 720x1280 (hxw) resolution. This is how config will look like:

train_pipeline:
    augmentation_config:
      random_crop:
        crop_size:
          - 720
          - 1280
        cat_max_ratio: 0.75
      resize:
        img_scale:
          - 2048 (or) 1024
          - 1280
        ratio_range:
          - 0.5
          - 2.0
      random_flip:
        prob: 0.5
    Pad:
      size_ht: 720
      size_wd: 1280
      pad_val: 0
      seg_pad_val: 255
 

For img_scale, please set second param to the shortest input resolution. For the first param, you can set it to any value greater than your shortest input resolution upto 2048 . But it is advisable to use equal height and width to avoid hassle of setting values at multiple places.
More, the img_scale should be >= crop_size .
We will improve document in next release.

Thanks @Morganh. I did not understand what you mean by that. All my images are 1280X720…

Which are the height and width parameters for what? This is very unclear…

David

@Morganh: I modified the spec file and did not give good results.

The modified spec file is here…

train_isbi.yaml (1.4 KB)

The bad results I reported in a separate post since it appears to me as a separate issue…

https://forums.developer.nvidia.com/t/migrating-tao3-unet-model-to-segformer-foreground-has-performance-of-0-0/243149/3

Thanks!

For the “img_scale”.
You can refer to above setting for your 1280x720 images.

@Morganh . I apologize, but I still don’t understand, and I am confused.

I want to create a segformer model with grayscale images of 1280X720, and background/foreground classes… What are the setting I need to define?

And for a second model of RGB images of the same 1280X720, multiclass with 6 classes?

Many thanks,

David

Refer to SegFormer - NVIDIA Docs and SegFormer - NVIDIA Docs

See Data Annotation Format - NVIDIA Docs
For the color/ rgb input images, each mask image is a single-channel or three-channel image with size equal to the input image. Every pixel in the mask should have an integer value that represents the segmentation class label_id

After reading the documentation several times, I still don’t know how to specify in the yaml spec file that the images are 1280X720 for training, evaluation (testing) and inference…

There are terms such as pad, scale and multi_scale, that I don’t understand their meaning…

@Morganh Thanks!

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

For below part,

      resize:
        img_scale:
          - 1024
          - 512
        ratio_range:
          - 0.5
          - 2.0

A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.
The img_scale contains the images scale base to multiply with ratio.
The ratio_range contains The minimum and maximum ratio to scale the img_scale.
For above example, the minimum ratio is 0.5, the maximum ratio is 2.0.
Then, the new height is are randomly set to the range from 1024x0.5 to 10242.0. The new width is randomly set to the range from 512x0.5 to 5122.0. So, the augmentation images’ resolution is (new_height, new_width) .

The validation config contains “multi_scale” for validation during training. The multi_scale is the largest scale of image.

Pad: It is the padding augmentation. size_ht (int): The height at which to pad the image/mask.
size_wd (int): The width at which pad the image/mask
pad_val (int): The padding value for the input image
seg_pad_val (int): The padding value for the segmentation

Example of RGB and multiclasses.

dataset_config:
  data_root: ???
  val_img_dir: ???
  val_ann_dir: ???
  train_img_dirs:  ???
  train_ann_dirs: ???
  img_norm_cfg:
      mean:
        - 123.675
        - 116.28
        - 103.53
      std:
        - 58.395
        - 57.12
        - 57.375
      to_rgb: true
  palette:
    - seg_class: car
      rgb:
        - 255
        - 0
        - 0
      label_id: 1
      mapping_class: car
    - seg_class: background
      rgb:
        - 10
        - 10
        - 10
      label_id: 0
      mapping_class: background
    - seg_class: road
      rgb:
        - 128
        - 64
        - 128
      label_id: 7
      mapping_class: road
    - seg_class: sidewalk
      rgb:
        - 244
        - 35
        - 232
      label_id: 8
      mapping_class: sidewalk
    - seg_class: building
      rgb:
        - 70
        - 70
        - 70
      label_id: 11
      mapping_class: building
    - seg_class: wall
      rgb:
        - 102
        - 102
        - 102
      label_id: 12
      mapping_class: wall
  train_pipeline:
    augmentation_config:
      random_crop:
        crop_size:
          - 512
          - 512
        cat_max_ratio: 0.75
      resize:
        img_scale:
          - 1024
          - 512
        ratio_range:
          - 0.5
          - 2.0
      random_flip:
        prob: 0.5
    Pad:
      size_ht: 512
      size_wd: 512
      pad_val: 0
      seg_pad_val: 255
  repeat_data_times: 500
  batch_size_per_gpu: 2
  workers_per_gpu: 2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.