Dino FAN small inference time is high

Please provide the following information when requesting support.

• Hardware (T4)
• Network Type (Dino FAN small)
• How to reproduce the issue ?
Converting the .pth weights into onnx (config attached below) and onnx to tensorRT and run the inference inside a docker with T4 GPU in cloud VM, with batch size as 1

When running the inference (for Dino FAN small FP32 ) to process 80 images , it took around 45 Sec, which is very close to the inference time of Dino FAN large FP32

But as mentioned in below NVIDIA docs Dino FAN small is around 2 times faster than Dino FAN large

Which is same for all the other small versions of Dino, than Dino FAN large
Could you please advice why is that and guide me on this

Configuration used for .pth weight to onnx conversion

  gpu_id: 0
  input_width: 960
  input_height: 544
  opset_version: 17
  on_cpu: False
  num_classes: 91
  batch_size: -1
  backbone: fan_small
  num_feature_levels: 4
  dec_layers: 6
  enc_layers: 6
  num_queries: 900
  num_select: 100
  dropout_ratio: 0.0
  dim_feedforward: 2048

Configuration used for onnx to tensorRT

  gpu_id: 0
  input_width: 960
  input_height: 544
    data_type: fp32
    workspace_size: 4096
    min_batch_size: 1
    opt_batch_size: 8
    max_batch_size: 8
  num_classes: 91
  batch_size: 1
  backbone: fan_small

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please use trtexec to profile. You can refer to TRTEXEC with DINO - NVIDIA Docs.
For example, run below against the two onnx files.

trtexec --onnx=/path/to/model.onnx \
        --maxShapes=inputs:16x3x544x960 \
        --minShapes=inputs:1x3x544x960 \
        --optShapes=inputs:8x3x544x960 \

Then check the log.
There is “compute time” in the log.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.