TensorRt5: How to save the engine after it has built

Thx for this amazing accelerating lib, it shows up great inference speed after using the tensorRt.

But the time consume in building engine is kind of taking too much time.
Is there any methods that I can save the built engine so that I don’t have to wait for the building each time when I am compiling my code.


The build phase can take considerable time, especially when running on embedded
platforms. Therefore, a typical application will build an engine once, and then serialize it
for later use.


Yep, it does cost a lot of time.
So can I save the engine in some kind of format so that I can skip the building part.


I think you want to serialize your engine. When you serialize, you are transforming the engine into a format to store and use at a later time for inference. To use for inference, you would simply deserialize the engine. Serializing and deserializing are optional. Since creating an engine from the Network Definition can be time consuming, you could avoid rebuilding the engine every time the application reruns by serializing it once and deserializing it while inferencing. Therefore, after the engine is built, users typically want to serialize it for later use.

Please reference: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#serial_model_python

Hello ,

i am trying to write the serialized engine onto the disk using C++ code . i am not able to find full code for this in the document


where the code is incomplete .

IHostMemory *serializedModel = engine->serialize();
// store model to disk
// <…>

can someone complete the store model to disk part and share as it will help me .



i found the answer for my question . code teo save the serialized model onto disk .

trtModelStream = engine->serialize();
ofstream p("googlenet_engine.engine");
p.write((const char*)trtModelStream->data(),trtModelStream->size());

pls share the correct code to read back the data ans size of the model . i am new to C++


@yuvaram: the tensorRT samples do quite a big of serializing (to memory), and deserializing (from memory). Just grep the samples for serialize( and deserializeCudaEngine. In order to write a model to disk, your code above looks OK (if you’re on Linux), but it would be better if you wrote

std::ofstream p("googlenet_engine.engine", std::ios::binary)"

(and this is I believe is necessary on Windows). In order to read it from disk, you’d do the inverse process using (for example) a std::ifstream, and subsequently calling deserializeCudaEngine as in the samples.

edit: corrected serializeCudaEngine -> serialize

1 Like

Is there any c++ sample which uses the engine serialize to save the TRT engine?