Thx for this amazing accelerating lib, it shows up great inference speed after using the tensorRt.
But the time consume in building engine is kind of taking too much time.
Is there any methods that I can save the built engine so that I don’t have to wait for the building each time when I am compiling my code.
The build phase can take considerable time, especially when running on embedded
platforms. Therefore, a typical application will build an engine once, and then serialize it
for later use.
I think you want to serialize your engine. When you serialize, you are transforming the engine into a format to store and use at a later time for inference. To use for inference, you would simply deserialize the engine. Serializing and deserializing are optional. Since creating an engine from the Network Definition can be time consuming, you could avoid rebuilding the engine every time the application reruns by serializing it once and deserializing it while inferencing. Therefore, after the engine is built, users typically want to serialize it for later use.
@yuvaram: the tensorRT samples do quite a big of serializing (to memory), and deserializing (from memory). Just grep the samples for serialize( and deserializeCudaEngine. In order to write a model to disk, your code above looks OK (if you’re on Linux), but it would be better if you wrote
(and this is I believe is necessary on Windows). In order to read it from disk, you’d do the inverse process using (for example) a std::ifstream, and subsequently calling deserializeCudaEngine as in the samples.