Wrong output when converting Depth Anything V2 ONNX model to Tensorrt

Description

I got the ONNX model of Depth Anything V2 from here and converted it to tensorrt engine fp16. However, the result produced is wrong.

Environment

TensorRT Version: 10.3.0.30
GPU Type: Nvidia Jetson Nano Orin 8GB
Nvidia Driver Version: 540.4.0
CUDA Version: 12.6.68
CUDNN Version: 9.3.0.75
Operating System + Version: Jetpack 6.2
Python Version (if applicable): 3.10.12
PyTorch Version (if applicable): 2.6.0+cpu
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

This Google Drive link contains both the ONNX and trt model with the test script and images used

Steps To Reproduce

  1. Convert the model to trt using this command: /usr/src/tensorrt/bin/trtexec --onnx=depth.onnx --saveEngine=depth.trt --optShapes=image:1x3x518x518 --fp16
  2. Run the script with: python test.py
  3. Check the output
xFormers not available
xFormers not available
[03/26/2025-10:51:09] [TRT] [I] Loaded engine size: 50 MiB
[03/26/2025-10:51:09] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +41, now: CPU 0, GPU 88 (MiB)

TensorRT Output:
 [[0.17358398 0.16516113 0.16967773 ... 0.23657227 0.23999023 0.21948242]
 [0.1730957  0.17041016 0.17041016 ... 0.23022461 0.2331543  0.22802734]
 [0.17138672 0.16931152 0.16796875 ... 0.23022461 0.2331543  0.22631836]
 ...
 [4.2851562  4.2851562  4.296875   ... 4.1445312  4.15625    4.15625   ]
 [4.3125     4.296875   4.3046875  ... 4.1835938  4.1835938  4.1679688 ]
 [4.2421875  4.2890625  4.25       ... 4.1679688  4.1875     4.1914062 ]]

PyTorch Output:
 [[0.        0.        0.        ... 0.        0.        0.       ]
 [0.        0.        0.        ... 0.        0.        0.       ]
 [0.        0.        0.        ... 0.        0.        0.       ]
 ...
 [6.896378  6.912275  6.89874   ... 5.4782257 5.472147  5.487279 ]
 [6.9368024 6.924689  6.922694  ... 5.5279803 5.5265183 5.5185995]
 [6.8403196 6.8959966 6.854626  ... 5.5324764 5.548835  5.5438104]]

I am checking this, in the meantime, pls check if the following pointers helps?

  1. Expecting bitwise identical results between TensorRT and frameworks like PyTorch or TensorFlow is unrealistic, even with FP32 precision. The order of floating-point computations can lead to minor differences. These shouldn’t significantly impact the accuracy metric of your application.
  2. A significant drop in metrics (like mAP for object detection) could indicate an issue with the TensorRT model. It’s essential to compare performance carefully to understand any discrepancies.
  3. If you encounter NaNs or infinity in outputs while using FP16/BF16 precision, intermediate outputs may be overflowing. To address this, consider:
  • Utilizing operations from ONNX with preserved precision.
  • Ensuring that the model does not use layers like InstanceNormalization, which have known performance decreases when converted.
  1. Understanding how TensorRT quantizes models is crucial. Utilizing Post-Training Quantization (PTQ) can generate a calibration cache that might help. For better results, consider performing Quantization Aware Training (QAT) in a deep learning framework before exporting to ONNX.

Pls let me know if this helps.

Hello @AakankshaS, thank you for your answer

  1. I’m aware that identical results are unrealistic but I was expecting the error range to be +/- 0.x instead of the above
  2. I do not have any performance metrics rn but the script does save both depth images from both models. Which shows the trt version has less details
  3. I do not see any NaNs/Infinity in the outputs
  4. I will check this, thanks