Question of Pretrained Segformer in NGC

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) Both dGPU and Jetson
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Segformer
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

  1. I would like to use the one provided by ngc as the Segformer Pretrained model. Can you tell me the input_image configuration? For example the following information: I think you can achieve effective pretrained effects by knowing the relevant information.

    input_type: "rgb" # replace with "rgb" for color images
      img_norm_cfg:
            mean:
              - 127.5
              - 127.5
              - 127.5
            std:
              - 127.5
              - 127.5
              - 127.5
            to_rgb: True
    
  2. Looking at the Segformer description in ngc, there are no separate benchmark results for Jetson. Can I say that the fan backbone-based model is not suitable for use on Jetson devices?

    These are the benchmark results when I tested with NVIDIA GeForce GTX 1660 SUPER. It is an fp16 model, and the deployable tiny model was downloaded from ngc.

    trtexec --loadEngine=./cityscapes_fan_tiny_hybrid_224.engine --shapes=input:4x3x224x224 --avgRuns=1000
    Latency: min = 21.3061 ms, max = 27.2076 ms, mean = 22.1975 ms, median = 21.7029 ms, percentile(90%) = 23.063 ms, percentile(95%) = 25.4922 ms, percentile(99%) = 27.0663 ms
    

    Considering that it takes an average of 22ms, I think it is not easy to use on the Jetson AGX board. Is my experiment correct?

  3. Deployable model is an onnx file. Can you tell me specifically how it was exported from trainable? I wonder if it was created simply by adding the model input size and dynamic option.

Thank you

Also, other than the basic FAN, isn’t there a mit-based pretrained model?

Please refer to below.

    img_norm_cfg:
        mean:
          - 123.675
          - 116.28
          - 103.53
        std:
          - 58.395
          - 57.12
          - 57.375
        to_rgb: true

More info for palette.

  palette:
    - seg_class: road
      rgb:
        - 128
        - 64
        - 128
      label_id: 7
      mapping_class: road
    - seg_class: sidewalk
      rgb:
        - 244
        - 35
        - 232
      label_id: 8
      mapping_class: sidewalk
    - seg_class: building
      rgb:
        - 70
        - 70
        - 70
      label_id: 11
      mapping_class: building
    - seg_class: wall
      rgb:
        - 102
        - 102
        - 102
      label_id: 12
      mapping_class: wall
    - seg_class: fence
      rgb:
        - 190
        - 153
        - 153
      label_id: 13
      mapping_class: fence
    - seg_class: pole
      rgb:
        - 153
        - 153
        - 153
      label_id: 17
      mapping_class: pole
    - seg_class: traffic light
      rgb:
        - 250
        - 170
        - 30
      label_id: 19
      mapping_class: traffic light
    - seg_class: traffic sign
      rgb:
        - 220
        - 220
        - 0
      label_id: 20
      mapping_class: traffic sign
    - seg_class: vegetation
      rgb:
        - 107
        - 142
        - 35
      label_id: 21
      mapping_class: vegetation
    - seg_class: terrain
      rgb:
        - 152
        - 251
        - 152
      label_id: 22
      mapping_class: terrain
    - seg_class: sky
      rgb:
        - 70
        - 130
        - 180
      label_id: 23
      mapping_class: sky
    - seg_class: person
      rgb:
        - 220
        - 20
        - 60
      label_id: 24
      mapping_class: person
    - seg_class: rider
      rgb:
        - 255
        - 0
        - 0
      label_id: 25
      mapping_class: rider
    - seg_class: car
      rgb:
        - 0
        - 0
        - 142
      label_id: 26
      mapping_class: car
    - seg_class: truck
      rgb:
        - 0
        - 0
        - 70
      label_id: 27
      mapping_class: car
    - seg_class: bus
      rgb:
        - 0
        - 60
        - 100
      label_id: 28
      mapping_class: bus
    - seg_class: train
      rgb:
        - 0
        - 80
        - 100
      label_id: 31
      mapping_class: train
    - seg_class: motorcycle
      rgb:
        - 0
        - 0
        - 230
      label_id: 32
      mapping_class: motorcycle
    - seg_class: bicycle
      rgb:
        - 119
        - 11
        - 32
      label_id: 33
      mapping_class: bicycle

You can find some perf result in Overview - NVIDIA Docs.

BTW, you can use below to calculate fps.
fps = bs * 1000 / <GPU Compute Time>

From your command, the bs is 4. The fps is about 180fps(4*1000/ 22.1975)

Refer to SegFormer - NVIDIA Docs

Yes, there is. See CitySemSegformer | NVIDIA NGC.

However, if I actually assume real-time processing from 4 cameras, in my specific case, I can think of it as (1000/22.1975)fps, right?

It might be difficult to use on Jetson :)

Yes, right.