ONNX Model and Tensorrt Engine gives different output

NitinRai · October 26, 2021, 8:16am

Description

I have exported a PyTorch model to ONNX and the output matches, which means the ONNX model seems to be working as expected. However, after generating Tensorrt Engine from this ONNX file the outputs are different.

Environment

TensorRT Version: 7.2.3.4
GPU Type: GTX 1650 - 4GB
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.8.5
PyTorch Version (if applicable): 1.9.0
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:21.05-py3

Relevant Files

Steps To Reproduce

Environment setup:

build the docker container

chmod +x build_container.sh
./build_container.sh

Run the container

chmod +x run_container.sh
./run_container.sh

Running Onnx model:

python lcc_onnx.py

Output:

Using ONNX as inference backend
Using weight: lcc.onnx

[[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.4570302
  0.5993874  0.         0.         0.         0.        ]
 [0.41986537 0.2868093  0.         0.5969408  0.84598017 0.9300823
  0.         0.05123539 0.99220806 0.         0.        ]
 [1.2950418  1.3727119  0.         0.9899633  0.         0.
  0.         0.         0.         0.         0.9957021 ]
 [0.         0.03012113 0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]]

Running Tensorrt Model:

python lcc_trt.py

Output:

Loading ONNX file: 'lcc.onnx'
[TensorRT] WARNING: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Completed parsing of ONNX file
converting to fp16
Building an Engine...
Completed creating Engine
Elapsed: 40.106 sec

[[2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [0.        0.        0.        0.        0.        0.        0.
  0.        0.        0.        0.       ]]

As you can see the outputs are completely different, One more strange behaviour I noticed is, Tensorrt engine almost gives same output for different input images. Please help in giving any pointers or help me debug the issue.

Thanks

NVES · October 26, 2021, 9:08am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

NitinRai · October 26, 2021, 4:27pm

I checked the model with the given snippet it doesn’t throw any error.
Here is the trtexec verbose output

The ONNX model and related files are included in the original post.
Thanks

spolisetty · October 27, 2021, 4:28am

Hi,

In our suggestion it’s to not expected to have such high level of matching against ONNX-Runtime or any two implementations of a DL model - whether on CPU, GPU, or a mix.
TensorRT provides no way to achieve this.
DL networks are typically robust against changes in the order of FP operations.

But please do let me know if that impacting the accuracy in your case.

Thanks

NitinRai · October 27, 2021, 4:36am

Hey @spolisetty

Indeed I had ported multiple DL models to Tensorrt but this is the first time I am encountering this kind of issue,
Yes, this not only impacts the accuracy but for any image input I am getting similar output i.,e the post-processing result is the same.
Thanks

spolisetty · October 27, 2021, 5:15am

Could you please confirm are you facing the same issue on TensorRT latest version 8.2 EA?
In latest version, performance issues have been resolved.

NitinRai · October 27, 2021, 7:24am

I tried the latest ngc container nvcr.io/nvidia/tensorrt:21.09-py3 which has tensorrt==8.0.3.0
It gives th same output, moreover after first inference I see this output Segmentation fault (core dumped)

spolisetty · October 28, 2021, 2:40pm

Hi,

We recommend you to please try on latest version and please share us complete error logs.

Thank you.

NitinRai · November 1, 2021, 7:12am

Hi @spolisetty,
I tried running the same ONNX model via Tensorrt 8.2
As I mentioned in the previous replies, there is no error but the output is not as I expected.

Here is the trtexec verbose output for v8.2

I also tried another approach: torch2trt

Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float

It seems like tensorrt is having difficulty in converting these methods, as per the docs we can overcome these by writing a converter to override unsupported methods.

spolisetty · November 16, 2021, 11:20am

Hi,

Sorry for the delayed response, are you still facing this issue.

NitinRai · November 16, 2021, 4:04pm

Yes, I am still facing the same issue.

spolisetty · January 19, 2022, 1:45pm

Hi,

We have a similar known issue. I believe its fixed in TensorRT version 8.2 GA update 1. Its released recently.
We request you to please verify one last time on the above version. If you still face this issue, please let us know, this will be fixed in future releases.

Thank you.

Senecobis · April 22, 2022, 5:05pm

Got similar issue and I tried all the above. When inferencing trough detectoron 2 transformers for panoptic segmentation, converted in a tensorrt serialized plan, everything works fine until output generation. it is different from the onnx output and changing input doesn’t affect the output that remains the same.

namphuongtran9196 · June 29, 2022, 4:23am

Hi, I got a similar issue and try to repeat the allocation process. Somehow, it passes this issue.
I repeat this block one time in the inference phase

# Allocate host and device buffers
bindings = []
for binding in engine:
    binding_idx = engine.get_binding_index(binding)
    size = trt.volume(context.get_binding_shape(binding_idx))
    dtype = trt.nptype(engine.get_binding_dtype(binding))
    if engine.binding_is_input(binding):
        input_buffer = np.ascontiguousarray(input_image)
        input_memory = cuda.mem_alloc(input_image.nbytes)
        bindings.append(int(input_memory))
    else:
        output_buffer = cuda.pagelocked_empty(size, dtype)
        output_memory = cuda.mem_alloc(output_buffer.nbytes)
        bindings.append(int(output_memory))

github.com

NVIDIA/TensorRT/blob/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#\n",
    "# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",

This file has been truncated. show original

def infer(engine, input_file, output_file):
    print("Reading input image from file {}".format(input_file))
    with Image.open(input_file) as img:
        input_image = preprocess(img)
        image_width = img.width
        image_height = img.height

    with engine.create_execution_context() as context:
        # Set input shape based on image dimensions for inference
        context.set_binding_shape(engine.get_binding_index("input"), (1, 3, image_height, image_width))
        # Allocate host and device buffers
        bindings = []
        for binding in engine:
            binding_idx = engine.get_binding_index(binding)
            size = trt.volume(context.get_binding_shape(binding_idx))
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            if engine.binding_is_input(binding):
                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))
            else:
                output_buffer = cuda.pagelocked_empty(size, dtype)
                output_memory = cuda.mem_alloc(output_buffer.nbytes)
                bindings.append(int(output_memory))

        bindings = []
        for binding in engine:
            binding_idx = engine.get_binding_index(binding)
            size = trt.volume(context.get_binding_shape(binding_idx))
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            if engine.binding_is_input(binding):
                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))
            else:
                output_buffer = cuda.pagelocked_empty(size, dtype)
                output_memory = cuda.mem_alloc(output_buffer.nbytes)
                bindings.append(int(output_memory))

        stream = cuda.Stream()
        # Transfer input data to the GPU.
        cuda.memcpy_htod_async(input_memory, input_buffer, stream)
        # Run inference
        context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
        # Transfer prediction output from the GPU.
        cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)
        # Synchronize the stream
        stream.synchronize()

    with postprocess(np.reshape(output_buffer, (image_height, image_width))) as img:
        print("Writing output image to file {}".format(output_file))
        img.convert('RGB').save(output_file, "PPM")

Topic		Replies	Views
Output from ONNX inference and trt inference are different Jetson TX2 tensorrt , tensorflow , nvbugs	6	957	October 18, 2021
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1858	September 28, 2023
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	4	876	March 21, 2023
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1181	December 13, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2266	November 29, 2022
ONNX Model and Tensorrt Engine gives different output for parseq model TensorRT onnx	4	1387	July 17, 2023
Tensorrt8.5 inference different with origin onnx model TensorRT	5	1523	January 23, 2023
Onnx output differs largely to TRT engine output TensorRT	14	2119	February 25, 2023
ONNX model and TensorRT engine works differently TensorRT	5	844	February 20, 2023
tensorRT inference unstable compared onnxruntime TensorRT	4	1474	May 4, 2021