Pre-trained Segformer - CityScapes - Input dims appear to be 224x224

IainA · February 6, 2024, 10:50pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) A6000
• DeepStream Version N/A
• JetPack Version (valid for Jetson only) N?A
• TensorRT Version 24.01 (Docker container)
• NVIDIA GPU Driver Version (valid for GPU only) 535.154.05
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) Run triton on CityScapes models
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi

I’ve been using the CitySemSeg etlt model (1080x1920) with deepstream using a C++ pipeline. All is working fine. I noticed there are new models called Pre-trained Segformer - CityScapes with onnx models (they appear to be annotated _224). I converted those to TensorRT and served from Triton Inference Server. I’m using the Python client however I need to adjust the input tensors to [3,244,244]. I did have them at [3,1024,1024] as suggested by the model narrative. The reduction in input shape makes the output tensors poor in appearance - are there [3,1024,1024] versions?

It doesn’t make much difference using Deepstream as the input shape is [3,224,224].

Cheers

Fiona.Chen · February 7, 2024, 10:36am

Are you talking about this? Pre-trained Segformer - CityScapes | NVIDIA NGC

IainA · February 7, 2024, 9:38pm

Hi Fiona,

Yes, those models appear relatively new and the narrative suggests that the input tensors are dimensioned as [3,102,1024] whereas they are actually [3,224,224].

Cheers

Morganh · February 8, 2024, 7:35am

@IainA
Which one did you use from Pre-trained Segformer - CityScapes | NVIDIA NGC?

IainA · February 8, 2024, 9:27pm

Hi

All the depolyable ones (ONNX). They seem to be all 224 input and the file names all have _224 suffixes.

Cheers

Morganh · February 9, 2024, 2:30am

In ngc, there are not [3,1024,1024] versions.
Will sync internally. Can you double check the [3,224,224] deploy models with tao-pytorch inference command?

IainA · February 12, 2024, 1:32am

Hi Morgan,

Triton inference server reports [3,224,224] input required when I pass a [3,1024,1024] image (as per the narrative and when I pass a [3,224,224] image it handles it correctly.

What is the difference between these models and the CitySemSegFormer models → citysemsegformer

Morganh · February 12, 2024, 3:29pm

For [3,1024,1024] image, it can handle correctly, right?
For [3,224,224] image, it can also handle correctly, right?

In CitySemSegformer | NVIDIA NGC, there are two kinds of backbones.
One is fan_base_16_p4_hybrid. The onnx input is 3x1024x1024.
The other is mit_b5.

In Pre-trained Segformer - CityScapes | NVIDIA NGC, the backbone are based on fan series. The input is 3x224x224.

IainA · February 13, 2024, 2:28pm

Hi Morgan

Sorry for delay and any confusion. To me all the deploy models expect input tensors of [3,224,224]. I could not get any deploy models to work with [3,1024,1024] images. To it looks like only the _224 models were uploaded. For the CitySEmSegformer model from NGC, I can get any size of image to work.

Hope that clarifies.

Cheers

Morganh · February 14, 2024, 4:35pm

Got it. It makes sense.

Seems to be a model request. I will sync internally.

Morganh · February 19, 2024, 6:58am

Hi @IainA
Please try to download the model and export to the onnx file you expect.
For example, download Pre-trained Segformer - CityScapes | NVIDIA NGC , then run tao model export to generate an onnx file.

segformer export -e /home/morganh/demo_3.0/forum_repro/segformer/spec.yaml export.checkpoint=cityscapes_fan_tiny_hybrid_224.pth export.onnx_file=1024_1024.onnx -r result

Spec file:

export:
  input_height: 1024
  input_width: 1024
  input_channel: 3
model:
  backbone:
    type: "fan_tiny_8_p4_hybrid"
dataset:
  img_norm_cfg:
      mean:
          - 123.675
          - 116.28
          - 103.53
      std:
          - 58.395
          - 57.12
          - 57.375
      to_rgb: true
  test_dataset:
      img_dir: /home/morganh/demo_2.0/unet/data/cityscapes/gtFine/train
      ann_dir: /home/morganh/demo_2.0/unet/data/cityscapes/gtFine/train
      pipeline:
        augmentation_config:
          resize:
            keep_ratio: True
  input_type: "rgb"
  data_root: /home/morganh/demo_2.0/unet/data/cityscapes/gtFine
  palette:
    - seg_class: road
      rgb:
        - 128
        - 64
        - 128
      label_id: 7
      mapping_class: road
    - seg_class: sidewalk
      rgb:
        - 244
        - 35
        - 232
      label_id: 8
      mapping_class: sidewalk
    - seg_class: building
      rgb:
        - 70
        - 70
        - 70
      label_id: 11
      mapping_class: building
    - seg_class: wall
      rgb:
        - 102
        - 102
        - 102
      label_id: 12
      mapping_class: wall
    - seg_class: fence
      rgb:
        - 190
        - 153
        - 153
      label_id: 13
      mapping_class: fence
    - seg_class: pole
      rgb:
        - 153
        - 153
        - 153
      label_id: 17
      mapping_class: pole
    - seg_class: traffic light
      rgb:
        - 250
        - 170
        - 30
      label_id: 19
      mapping_class: traffic light
    - seg_class: traffic sign
      rgb:
        - 220
        - 220
        - 0
      label_id: 20
      mapping_class: traffic sign
    - seg_class: vegetation
      rgb:
        - 107
        - 142
        - 35
      label_id: 21
      mapping_class: vegetation
    - seg_class: terrain
      rgb:
        - 152
        - 251
        - 152
      label_id: 22
      mapping_class: terrain
    - seg_class: sky
      rgb:
        - 70
        - 130
        - 180
      label_id: 23
      mapping_class: sky
    - seg_class: person
      rgb:
        - 220
        - 20
        - 60
      label_id: 24
      mapping_class: person
    - seg_class: rider
      rgb:
        - 255
        - 0
        - 0
      label_id: 25
      mapping_class: rider
    - seg_class: car
      rgb:
        - 0
        - 0
        - 142
      label_id: 26
      mapping_class: car
    - seg_class: truck
      rgb:
        - 0
        - 0
        - 70
      label_id: 27
      mapping_class: car
    - seg_class: bus
      rgb:
        - 0
        - 60
        - 100
      label_id: 28
      mapping_class: bus
    - seg_class: train
      rgb:
        - 0
        - 80
        - 100
      label_id: 31
      mapping_class: train
    - seg_class: motorcycle
      rgb:
        - 0
        - 0
        - 230
      label_id: 32
      mapping_class: motorcycle
    - seg_class: bicycle
      rgb:
        - 119
        - 11
        - 32
      label_id: 33
      mapping_class: bicycle
  workers_per_gpu: 1
  batch_size: -1

IainA · February 19, 2024, 2:01pm

Thanks @Morganh - will try and revert back with results.
Cheers

IainA · February 19, 2024, 5:37pm

Hi @Morganh

I created the onnx and used tao deploy (rather than trtexec) to produce the TRT model.plan and used your spec file with modifications to point to my filesystem/container.

When I run polygraphy inspect model model.plan (from within a TRT container - tensorrt:24.01-py3) I get the following output:

[I] Loading bytes from /trt_optimize/model.plan
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 1 Engine Input(s) ----
{input [dtype=float32, shape=(-1, 3, 1024, 1024)]}

---- 1 Engine Output(s) ----
{output [dtype=int32, shape=(1, -1, 1024, 1024)]}

---- Memory ----
Device Memory: 21979332608 bytes

---- 1 Profile(s) (2 Tensor(s) Each) ----
- Profile: 0
    Tensor: input           (Input), Index: 0 | Shapes: min=(1, 3, 1024, 1024), opt=(8, 3, 1024, 1024), max=(8, 3, 1024, 1024)
    Tensor: output         (Output), Index: 1 | Shape: (1, -1, 1024, 1024)

I’m assuming that the output dimension of -1 (dynamic) means that the output channels are configured at runtime and will output 3 channels when the input is 3 channels?

I’m also assuming I can get the segmentation color for each class from the spec file you provide above?

Thank you for all your help with this.

Cheers

Morganh · February 25, 2024, 3:45pm

Could you use trtexec to generate tensort engine to double check? Refer to TRTEXEC with Segformer - NVIDIA Docs.

IainA · March 6, 2024, 9:15pm

Hi @Morganh
I used the trtexec that is exposed in the tao deploy container using:

!tao deploy segformer run trtexec --onnx=$SPECS_DIR/1024_1024.onnx
–maxShapes=input:16x3x1024x1024
–minShapes=input:1x3x1024x1024
–optShapes=input:8x3x1024x1024
–fp16
–saveEngine=$SPECS_DIR/model.plan

The 1024_1024.onnx file was generated using the spec file you provided above. I then used polygraphy by using:

!tao deploy segformer run polygraphy inspect model $SPECS_DIR/model.plan

And got the following output:

[I] Loading bytes from /workspace/tao-experiments/specs/model.plan
[I] ==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 1 Engine Input(s) ----
{input [dtype=float32, shape=(-1, 3, 1024, 1024)]}

---- 1 Engine Output(s) ----
{output [dtype=int32, shape=(1, -1, 1024, 1024)]}

---- Memory ----
Device Memory: 14773387264 bytes

---- 1 Profile(s) (2 Tensor(s) Each) ----
- Profile: 0
    Tensor: input           (Input), Index: 0 | Shapes: min=(1, 3, 1024, 1024), opt=(8, 3, 1024, 1024), max=(16, 3, 1024, 1024)
    Tensor: output         (Output), Index: 1 | Shape: (1, -1, 1024, 1024)

---- 254 Layer(s) ----

So the dynamic channel (-1) should cope with a 3 channel input tensor by inferencing out a 3 channel tensor, but it appears to give a grayscale output (ie. 1x1024x1024).

Any further thoughts? Thank you.

Morganh · March 7, 2024, 3:22am

I will try on my side as well. Thanks for the info.

Morganh · March 8, 2024, 5:58am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hi @IainA
I run with below steps.

$ /usr/src/tensorrt/bin/trtexec --onnx=citysemsegformer_fan.onnx --minShapes=input:1x3x1024x1024 --optShapes=input:1x3x1024x1024 --maxShapes=input:1x3x1024x1024 --saveEngine=fp32.engine

The result is as below.

# polygraphy inspect model fp32.engine
[I] Loading bytes from /home/morganh/demo_3.0/forum_repro/segformer/fp32.engine
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {input [dtype=float32, shape=(1, 3, 1024, 1024)]}

    ---- 1 Engine Output(s) ----
    {output [dtype=int32, shape=(1, 1, 1024, 1024)]}

    ---- Memory ----
    Device Memory: 2192719872 bytes

    ---- 1 Profile(s) (2 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input]  | Shapes: min=(1, 3, 1024, 1024), opt=(1, 3, 1024, 1024), max=(1, 3, 1024, 1024)
        Binding Index: 1 (Output) [Name: output] | Shape: (1, 1, 1024, 1024)

    ---- 341 Layer(s) ----

If set different shapes,

$ /usr/src/tensorrt/bin/trtexec --onnx=citysemsegformer_fan.onnx --minShapes=input:2x3x1024x1024 --optShapes=input:3x3x1024x1024 --maxShapes=input:4x3x1024x1024 --saveEngine=fp32_dynamic.engine

$polygraphy inspect model fp32_dynamic.engine
[I] Loading bytes from /home/morganh/demo_3.0/forum_repro/segformer/fp32_dynamic.engine
[I] ==== TensorRT Engine ====
    Name: Unnamed Network 0 | Explicit Batch Engine

    ---- 1 Engine Input(s) ----
    {input [dtype=float32, shape=(-1, 3, 1024, 1024)]}

    ---- 1 Engine Output(s) ----
    {output [dtype=int32, shape=(1, -1, 1024, 1024)]}

    ---- Memory ----
    Device Memory: 9035645440 bytes

    ---- 1 Profile(s) (2 Binding(s) Each) ----
    - Profile: 0
        Binding Index: 0 (Input)  [Name: input]  | Shapes: min=(2, 3, 1024, 1024), opt=(3, 3, 1024, 1024), max=(4, 3, 1024, 1024)
        Binding Index: 1 (Output) [Name: output] | Shape: (1, -1, 1024, 1024)

    ---- 343 Layer(s) ----

So, the -1 is related to the batch-size.

system · March 25, 2024, 5:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1415	July 12, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	963	September 29, 2022
Exporting model to onnx using "tao model segformer export" TAO Toolkit	5	497	September 6, 2023
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2061	November 29, 2022
Using Custom action recognition Model in Deepstream 3D action recognition and Getting Error TAO Toolkit	70	925	December 12, 2023
Trtexec convert onnx to engine fails TAO Toolkit	14	1219	October 30, 2023
Model onnx trt engine generation process report different results compared between PC and jetson XAVIER NX Jetson Xavier NX tensorrt	19	1022	September 28, 2022
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1498	October 12, 2021
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	736	April 30, 2024
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1293	October 12, 2021

Pre-trained Segformer - CityScapes - Input dims appear to be 224x224

Related topics