Load TensorRT engine and deserialize in C++

Where can I see C++ sample to load TensorRT engine and deserialize for inference in C++?

This is in Python and I’m looking for C++ version.

  1. with open(“sample.engine”, “wb”) as f: f.write(engine.serialize())

  2. Read the engine from the file and deserialize:

with open(“sample.engine”, “rb”) as f, trt.Runtime(TRT_LOGGER) as runtime: engine = runtime.deserialize_cuda_engine(f.read())

1 Like

Hi @edit_or,

Kindly refer to the below links


It loads UFF model and create engine.
For me I like to load TensorRT engine file (detect.engine) directly in C++.
Because TensorRT engine is created using the same system, so I don’t need to rebuild. I can directly use TensorRT engine.
How to load and deserialize in C++?

Hi @edit_or
You can use trtexec command to load the engine
trtexec --loadEngine=g1.trt --batch=1


Hello AakankshaS,

I don’t think this really answer the problemtic we have here.

I am facing the same issue as edit_or and the point is above loading in C++ (loadingfrom terminal is working) a .engine model.
I couldn’t find an answer to the problem but many supposition such as workign with the deepstream sample (nvdsinfer_custom_impl_yolo), modifying source. But still, loading this to a custom C++ apps remains very mysterious.

I am not sure of what the .engine file really aim at and documentation seems poor on this topic. From your answer does it means we are suppose to include and eventually modify the source of trtexec in our app to run a model ? Isn’t there some sort of c++ interface/lib for doing so ?

Thank for your attention

1 Like

It seems amazing to me that Nvidia is always trying to empower developers to do things with its libraries and frameworks but its literally impossible just to know how to load an .engine model and perform inference with it (in simple plain C++ code).

This is all they show (barely 7, out of context, lines). So you have to dive into Dustin Franklin Jetson-Inference code for hours just to understand a little bit how does this works.

Could you please show a simple and explained script on how to work with .engine files to perform inference?

I think this will help us to build better solutions using Nvidia frameworks. :)
Nice Regards.

Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details: