Please provide the following information when requesting support.
• Hardware (T4)
• Network Type (Dino FAN small)
• How to reproduce the issue ?
Converting the .pth weights into onnx (config attached below) and onnx to tensorRT and run the inference inside a docker with T4 GPU in cloud VM, with batch size as 1
When running the inference (for Dino FAN small FP32 ) to process 80 images , it took around 45 Sec, which is very close to the inference time of Dino FAN large FP32
But as mentioned in below NVIDIA docs Dino FAN small is around 2 times faster than Dino FAN large
Which is same for all the other small versions of Dino, than Dino FAN large
Could you please advice why is that and guide me on this
Configuration used for .pth weight to onnx conversion
export:
gpu_id: 0
input_width: 960
input_height: 544
opset_version: 17
on_cpu: False
dataset:
num_classes: 91
batch_size: -1
model:
backbone: fan_small
num_feature_levels: 4
dec_layers: 6
enc_layers: 6
num_queries: 900
num_select: 100
dropout_ratio: 0.0
dim_feedforward: 2048
Configuration used for onnx to tensorRT
gen_trt_engine:
gpu_id: 0
input_width: 960
input_height: 544
tensorrt:
data_type: fp32
workspace_size: 4096
min_batch_size: 1
opt_batch_size: 8
max_batch_size: 8
dataset:
num_classes: 91
batch_size: 1
model:
backbone: fan_small