Getting different results from Serialized TensorRT Engine vs. Onnx

Description

I am currently trying to run an onnx model using TensorRT, and I have been trying to leverage the engine serialization to speed up loading times. However, I have noticed that I get different results from the model when running the Parsed Onnx Model vs. running the serialize engine.

Here is a plot of confidences over time using the model directly loaded from onnx

I am loading the onnx file as follows:

 auto builder = TRTUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(basic_logger));
 const auto explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
 auto network = TRTUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
 auto parser = TRTUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, basic_logger));
 auto config = TRTUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
 config->setMaxWorkspaceSize(32 * MEGABYTES_TO_BYTES);
  auto engine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), TRTDeleter());

Here is the plot of confidences when I use a deserialized engine. Its a similar shape, but noticably 10-20% lower in confidence (which is an output of our model)

To check the serialized model, I am serializing and deserializing the engine as follows

 auto serialized_engine = std::shared_ptr<nvinfer1::IHostMemory>(engine->serialize(), TRTDeleter());
 auto runtime = std::shared_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(basic_logger), TRTDeleter());
 auto deserializedEngine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(serialized_engine->data(), serialized_engine->size()), TRTDeleter());

To run the model I use the following code (can use either the engine or deserialized engine)

auto execution_context = TRTUniquePtr<nvinfer1::IExecutionContext>(engine->createExecutionContext());
bool executed = execution_context->enqueue(1, bindings, stream, nullptr);

Is there something I am missing when deserializing the model? Given the code I have listed above, I would expect the model to produce the same results as the original model right?

Environment

OS Version: Ubuntu 18.0.4 x86_64
Cuda Driver: 460.80
Onnx 1.6.0
Tensorrt Version: 7.1.3

Model

Here is the onnx model (stripping out the parameters):
model_stripped.onnx (34.3 KB)

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

So I have only noticed this when running using the c++ API. On my Jetson Xavier, I have verified that trtexec produces the same results with the serialized model before / after saving the engine.

I will try making a small c++ unit test to verify the approach.

Hi @VivekKrishnan,

We recommend you to please share issue repro minimal scripts and steps to try from our end as well for better assistance.

Thank you.