Description
I have a regression model that predicts face quality from input images. Initially, I trained the model using PyTorch and exported it to ONNX format. Then, I converted the ONNX model to a TensorRT engine using the provided onnx_to_tensorrt.py
script (I also tried using trtexec
).
When I load the TensorRT engine in Python (test_trt.py
), the output is nearly identical to the ONNX model, with a precision up to three decimal places. However, when I load the same engine in C++ (main.cpp
), the output differs slightly. For example, the Python script may output 0.12
, while the C++ version gives 0.18
for the same input.
Is this behavior normal, or could there be an issue in my C++ inference code?
Thanks
Environment
TensorRT Version: 8.5.3.1
GPU Type: NVIDIA GeForce RTX 3060
Nvidia Driver Version: 570.86.15
CUDA Version: 12.1
CUDNN Version: 8.8.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8.10
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/deepstream:6.3-gc-triton-devel
Relevant Files
Unfortunately, I’m not able to share the trained model, but I can provide some details (I can send it to private message). The model takes an input with shape 1x3x112x112
.
Other files: data_tesnorrt.zip - Google Drive
Reproduce
- Run docker
- pip install -r requirements.txt
- get some tesnorrt engine
- change engine path in test_trt.py
- python test_trt.py
- make
- ./test <model_path> in.jpg