ONNX Model and Tensorrt Engine gives different output

Description

I have exported a PyTorch model to ONNX and the output matches, which means the ONNX model seems to be working as expected. However, after generating Tensorrt Engine from this ONNX file the outputs are different.

Environment

TensorRT Version: 7.2.3.4
GPU Type: GTX 1650 - 4GB
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.8.5
PyTorch Version (if applicable): 1.9.0
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:21.05-py3

Relevant Files

Steps To Reproduce

Environment setup:

  1. build the docker container

    chmod +x build_container.sh
    ./build_container.sh
    
  2. Run the container

    chmod +x run_container.sh
    ./run_container.sh
    

Running Onnx model:

python lcc_onnx.py

Output:

Using ONNX as inference backend
Using weight: lcc.onnx

[[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.4570302
  0.5993874  0.         0.         0.         0.        ]
 [0.41986537 0.2868093  0.         0.5969408  0.84598017 0.9300823
  0.         0.05123539 0.99220806 0.         0.        ]
 [1.2950418  1.3727119  0.         0.9899633  0.         0.
  0.         0.         0.         0.         0.9957021 ]
 [0.         0.03012113 0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.        ]]

Running Tensorrt Model:

python lcc_trt.py

Output:

Loading ONNX file: 'lcc.onnx'
[TensorRT] WARNING: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Completed parsing of ONNX file
converting to fp16
Building an Engine...
Completed creating Engine
Elapsed: 40.106 sec

[[2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.5950394 2.5950394 2.5950394 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [2.0784717 2.0784717 2.0784717 2.0784717 2.0784717 0.        0.
  0.        0.        0.        0.       ]
 [0.        0.        0.        0.        0.        0.        0.
  0.        0.        0.        0.       ]]

As you can see the outputs are completely different, One more strange behaviour I noticed is, Tensorrt engine almost gives same output for different input images. Please help in giving any pointers or help me debug the issue.

Thanks

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

I checked the model with the given snippet it doesn’t throw any error.
Here is the trtexec verbose output

The ONNX model and related files are included in the original post.
Thanks

Hi,

In our suggestion it’s to not expected to have such high level of matching against ONNX-Runtime or any two implementations of a DL model - whether on CPU, GPU, or a mix.
TensorRT provides no way to achieve this.
DL networks are typically robust against changes in the order of FP operations.

But please do let me know if that impacting the accuracy in your case.

Thanks

Hey @spolisetty

Indeed I had ported multiple DL models to Tensorrt but this is the first time I am encountering this kind of issue,
Yes, this not only impacts the accuracy but for any image input I am getting similar output i.,e the post-processing result is the same.
Thanks

Could you please confirm are you facing the same issue on TensorRT latest version 8.2 EA?
In latest version, performance issues have been resolved.

I tried the latest ngc container nvcr.io/nvidia/tensorrt:21.09-py3 which has tensorrt==8.0.3.0
It gives th same output, moreover after first inference I see this output Segmentation fault (core dumped)

Hi,

We recommend you to please try on latest version and please share us complete error logs.

Thank you.

Hi @spolisetty,
I tried running the same ONNX model via Tensorrt 8.2
As I mentioned in the previous replies, there is no error but the output is not as I expected.

Here is the trtexec verbose output for v8.2

I also tried another approach: torch2trt

Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float
Warning: Encountered known unsupported method torch.ge
Warning: Encountered known unsupported method torch.Tensor.float

It seems like tensorrt is having difficulty in converting these methods, as per the docs we can overcome these by writing a converter to override unsupported methods.

Hi,

Sorry for the delayed response, are you still facing this issue.

Yes, I am still facing the same issue.

Hi,

We have a similar known issue. I believe its fixed in TensorRT version 8.2 GA update 1. Its released recently.
We request you to please verify one last time on the above version. If you still face this issue, please let us know, this will be fixed in future releases.

Thank you.

Got similar issue and I tried all the above. When inferencing trough detectoron 2 transformers for panoptic segmentation, converted in a tensorrt serialized plan, everything works fine until output generation. it is different from the onnx output and changing input doesn’t affect the output that remains the same.

Hi, I got a similar issue and try to repeat the allocation process. Somehow, it passes this issue.
I repeat this block one time in the inference phase

# Allocate host and device buffers
bindings = []
for binding in engine:
    binding_idx = engine.get_binding_index(binding)
    size = trt.volume(context.get_binding_shape(binding_idx))
    dtype = trt.nptype(engine.get_binding_dtype(binding))
    if engine.binding_is_input(binding):
        input_buffer = np.ascontiguousarray(input_image)
        input_memory = cuda.mem_alloc(input_image.nbytes)
        bindings.append(int(input_memory))
    else:
        output_buffer = cuda.pagelocked_empty(size, dtype)
        output_memory = cuda.mem_alloc(output_buffer.nbytes)
        bindings.append(int(output_memory))
def infer(engine, input_file, output_file):
    print("Reading input image from file {}".format(input_file))
    with Image.open(input_file) as img:
        input_image = preprocess(img)
        image_width = img.width
        image_height = img.height

    with engine.create_execution_context() as context:
        # Set input shape based on image dimensions for inference
        context.set_binding_shape(engine.get_binding_index("input"), (1, 3, image_height, image_width))
        # Allocate host and device buffers
        bindings = []
        for binding in engine:
            binding_idx = engine.get_binding_index(binding)
            size = trt.volume(context.get_binding_shape(binding_idx))
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            if engine.binding_is_input(binding):
                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))
            else:
                output_buffer = cuda.pagelocked_empty(size, dtype)
                output_memory = cuda.mem_alloc(output_buffer.nbytes)
                bindings.append(int(output_memory))

        bindings = []
        for binding in engine:
            binding_idx = engine.get_binding_index(binding)
            size = trt.volume(context.get_binding_shape(binding_idx))
            dtype = trt.nptype(engine.get_binding_dtype(binding))
            if engine.binding_is_input(binding):
                input_buffer = np.ascontiguousarray(input_image)
                input_memory = cuda.mem_alloc(input_image.nbytes)
                bindings.append(int(input_memory))
            else:
                output_buffer = cuda.pagelocked_empty(size, dtype)
                output_memory = cuda.mem_alloc(output_buffer.nbytes)
                bindings.append(int(output_memory))

        stream = cuda.Stream()
        # Transfer input data to the GPU.
        cuda.memcpy_htod_async(input_memory, input_buffer, stream)
        # Run inference
        context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
        # Transfer prediction output from the GPU.
        cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)
        # Synchronize the stream
        stream.synchronize()

    with postprocess(np.reshape(output_buffer, (image_height, image_width))) as img:
        print("Writing output image to file {}".format(output_file))
        img.convert('RGB').save(output_file, "PPM")