Dinov2 TensorRT model performance issue

Description

I converted the Dinov2 (embeddings - vits14) torch model to onnx and then to TensorRT.
I extracted the emb from the following cropped images:

  1. drink image (regular position)
  2. drink image (rotated position)
  3. chips image (regular position)
  4. chips image (rotated position)

The distance between the emb of drink Vs drink rotated is ~0.4 (the same for chips Vs chips rotated)
The distance between the emb of drink Vs chips is ~ 0.75
Which make sense because I’m expecting to see higher distance between different products and lower distance between the same products.

The above test was based on torch and onnx Dinov2 models.
But when trying to use the TensorRt model I’m getting the following distances:
Distance between drink and chips is ~0.53
Distance between drink and drink rotated is ~0.42
Which is too close. I’m expecting a better distance separation as Torch and Onnx model providing.

Any ideas ? Maybe the TensorRT conversion was done incorrectly ?

Environment

TensorRT Version: 10.3.0
GPU Type: Nvidia
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6
Operating System + Version: Jetson orin nano - developer kit, jetpack 6.0
Python Version (if applicable): 3.10.12

Conversion record:

  1. Torch to onnx conversion:
    import torch

Wrap the model to exclude the masks input during export

class DINOEmbeddingExtractor(torch.nn.Module):
def init(self, model):
super(DINOEmbeddingExtractor, self).init()
self.model = model

def forward(self, x):
    # Forward only the required input and ignore any mask inputs
    return self.model(x)

Load your DINOv2 model

model = torch.hub.load(“facebookresearch/dinov2”, “dinov2_vits14”, source=“github”)
model.eval()

Wrap the model

embedding_extractor = DINOEmbeddingExtractor(model)

Define a dummy input for the ONNX export

dummy_input = torch.randn(1, 3, 224, 224)

Export the wrapped model to ONNX without the mask input

torch.onnx.export(
embedding_extractor,
dummy_input,
“dinov2_vit_s14_no_masks.onnx”,
input_names=[“input”],
output_names=[“output”],
dynamic_axes={“input”: {0: “batch_size”}, “output”: {0: “batch_size”}},
opset_version=17
)
print(“Model successfully exported to dinov2_vit_s14_no_masks.onnx”)

  1. Onnx to TensorRt conversion:
    /usr/src/tensorrt/bin/trtexec --onnx=/home/shraga/workspace/projects/kanduai-express-checkout/inference_testing_and_convertions/embeddings/dinov2_vit_s14_no_masks.onnx --saveEngine=/home/shraga/workspace/projects/kanduai-express-checkout/inference_testing_and_convertions/embeddings/dinov2_vit_s14_no_masks_fp32.trt

Hi @david476 ,
Do you mind sharing teh model with us?