The TensorRT engine produces different inference results when loaded using Python compared to C++

Description

I have a regression model that predicts face quality from input images. Initially, I trained the model using PyTorch and exported it to ONNX format. Then, I converted the ONNX model to a TensorRT engine using the provided onnx_to_tensorrt.py script (I also tried using trtexec).

When I load the TensorRT engine in Python (test_trt.py), the output is nearly identical to the ONNX model, with a precision up to three decimal places. However, when I load the same engine in C++ (main.cpp), the output differs slightly. For example, the Python script may output 0.12, while the C++ version gives 0.18 for the same input.

Is this behavior normal, or could there be an issue in my C++ inference code?

Thanks

Environment

TensorRT Version: 8.5.3.1
GPU Type: NVIDIA GeForce RTX 3060
Nvidia Driver Version: 570.86.15
CUDA Version: 12.1
CUDNN Version: 8.8.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/deepstream:6.3-gc-triton-devel

Relevant Files

Unfortunately, I’m not able to share the trained model, but I can provide some details (I can send it to private message). The model takes an input with shape 1x3x112x112.

Other files: data_tesnorrt.zip - Google Drive

Reproduce

  1. Run docker
  2. pip install -r requirements.txt
  3. get some tesnorrt engine
  4. change engine path in test_trt.py
  5. python test_trt.py
  6. make
  7. ./test <model_path> in.jpg

We discovered that the mismatch was caused by differences in preprocessing: during training, the model used torchvision transforms and Pillow, while during inference in C++ we used OpenCV for preprocessing. After retraining the model using OpenCV preprocessing in both training and inference, we observed that the results are consistent and match closely.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.