TensorRT ICudaEngine serialization size is different for the same uff file INetworkDefinition

I’m working on the Jetson TX development kit board.
My JetPack version is 3.2.1 but I updated the TensorRT and cuDNN via JetPack 3.3 which mean I have TensorRT 4 and cuDNN 7.1.5.

I wrote an application based on the SampleUffMNIST example and it work fine.

But, I see a strange phenomenon:
Sometimes, for the same INetworkDefinition object that was built from the same Uff file input via a parse object (using nvuffparser::createUffParser()) and IBuilder object, I’m getting that the IUCudaEngine object serialize operation return an IHostMemory object that its size method return a different value than previous activation with the same inputs.

The different size range can be a dozens of MBs.

And the most strange fact that after I saved them to disk and reload them using the deserializeCudaEngine method all of them execute properly via my IExecutionContext object that was created.

So, my question is general:
Is it possible that the serialized data will have a different size (length in bytes) for the same inputs?
Is it GPU\OS state dependent?
If yes, How?

If not, I will provide my code.


Hello, can you provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

Please provide your source. it’ll help us debug.

Thank you very much for your response.
I didn’t answer you yet because I was out of office till today and I don’t have the required information to answer you till I will come back to work.
I will come back to my office at Sunday, then I will have the ability to provide you all the required answers.

Thanks again!

The Tensorflow pb (which was converted to the uff file) was generated under the following platform:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type – GTX-1070TI
nvidia driver version – 384.130
CUDA version – 8.0.44
CUDNN version – 6.0.21
Python version – 3.5.2
Tensorflow version – 1.4.1
TensorRT version – Not used

The TensorRT uff was generated under the following platform:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – 3.5.2
Tensorflow version – 1.8
TensorRT version –

The TensorRT CUDA engine serialized file was generated under the following platforms:

Only C++ - No Python usage at all.
Only TensorRT - No Tensorflow usage at all

Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – not used
Tensorflow version – Not used
TensorRT version –

Jetson TX2 developer kit board:
Linux distro and version –
Ubuntu 16.04.5 LTS (Xenial Xersus)
L4T -
#R28 (release), REVISION 2.1, GCID: 11272647, BOARD: t186ref, EABI: aarch64, DATE: Thu May 17 07:29:06 UTC 2018
GPU type -
As part of the Jetson TX2 developer kit board
JetPack –
3.2.1 (But TensorRT and CUDNN were updated according to JetPack 3.3 versions)
nvidia driver version - As part of the JetPack
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.5
Python version – Not used
Tensorflow version – Not used
TensorRT version –

The TensorRT uff was generated using these Python commands:

Import uff
Uff_model = uff.from_tensorflow_frozen_model(“My pb file path”, “List of my graph outputs nodes names”, text = “My uff text file path”, list_modes=False,  output_filename=”My uff binary file path”)

The TensorRT CUDA engine serialized file was generated using these C++ commands:

/*Objects and types declarations:*/
nvuffparser::IUffParser *m_parser;
nvinfer1::IBuilder *m_builder;
nvinfer1::INetworkDefinition *m_network;
nvinfer1::ICudaEngine *m_engine;
nvinfer1::IExecutionContext *m_context;

m_parser = nvuffparser::createUffParser();

for (auto& tensorIntput : m_tensorsInputs)

for (auto& tensorOutput : m_tensorOutputsNames)

m_builder = nvinfer1::createInferBuilder(m_tRTLogger);


m_network = m_builder->createNetwork();
if (!m_parser->parse(m_uffFilePath.c_str(), *m_network, nvinfer1::DataType::kFLOAT))
                m_tRTLogger.log(nvinfer1::ILogger::Severity::kERROR, std::string("Fail to parse").c_str());
                throw std::runtime_error(std::string("CUDA engine serialization operation error!!!"));                   
/*After the network has been defined, build the engine by configuring the builder*/
m_engine = m_builder->buildCudaEngine(*m_network);

/* we can clean the network and the parser */

nvinfer1::IHostMemory *engineSerialized = m_engine->serialize();

std::ofstream engineSerializedFile;

				/*Open a new file for holding the serialized engine data*/
				engineSerializedFile.open(engineSerializedFilePath.string(), std::ios::out | std::ios::binary);

				if (engineSerializedFile.is_open() && engineSerializedFile.good() && !engineSerializedFile.fail())
					/*Save the serialized engine data into the file*/
					engineSerializedFile.write(reinterpret_cast<const char *>(engineSerialized->data()), engineSerialized->size());

					/*Close the file*/


Hello orong13,

In general, the CUDA engine produced by a given INetworkDefinition by a TensorRT builder may depend on various system factors (GPU, OS/kernel, CPU, system load, available memory, etc.) that affect layer implementation availability and timing during the process of building an engine.

The serialization of the engine is dependent on the layer implementations chosen for that engine and therefore is also dependent on these system factors. It is therefore not unexpected behavior if a given INetworkDefinition produces different engines which have different serializations over multiple runs. If the same engine in TensorRT produced different serializations over multiple runs, that would be unexpected behavior.

1 Like

Thank you for your clarification.
I understand that there are several system factors which can impact on the generated CUDA engine but when the CUDA engine was generated it cannot be serialized differently.

Last questions regarding this issue:

  1. The maximum size gap that I got was more than ~500 MBs. First I gor a serialized file which had ~600MBs and secondly I got less than 100 ~MBs. Both of them executed properly when I de-serialized them. Despite the fact that the generated CUAD engines will be different (due to these system factors).

    Is it an expected size gap?
    Are these system factors have so much impact?
    Is it means that my graph size and layers were degraded so much and still executed correctly?
    Do I have any control on these system factors or do you have any recommendations what shall be the best system factors states in order to generate the best CUDA engine (From performances and detections quality)?