TensorRT ICudaEngine serialization size is different for the same uff file INetworkDefinition

orong13 · August 22, 2018, 1:02pm

Hello,
I’m working on the Jetson TX development kit board.
My JetPack version is 3.2.1 but I updated the TensorRT and cuDNN via JetPack 3.3 which mean I have TensorRT 4 and cuDNN 7.1.5.

I wrote an application based on the SampleUffMNIST example and it work fine.

But, I see a strange phenomenon:
Sometimes, for the same INetworkDefinition object that was built from the same Uff file input via a parse object (using nvuffparser::createUffParser()) and IBuilder object, I’m getting that the IUCudaEngine object serialize operation return an IHostMemory object that its size method return a different value than previous activation with the same inputs.

The different size range can be a dozens of MBs.

And the most strange fact that after I saved them to disk and reload them using the deserializeCudaEngine method all of them execute properly via my IExecutionContext object that was created.

So, my question is general:
Is it possible that the serialized data will have a different size (length in bytes) for the same inputs?
Is it GPU\OS state dependent?
If yes, How?

If not, I will provide my code.

Thanks,

NVES · August 26, 2018, 4:40am

Hello, can you provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

NVES · August 31, 2018, 3:44pm

Please provide your source. it’ll help us debug.

orong13 · August 31, 2018, 4:58pm

Thank you very much for your response.
I didn’t answer you yet because I was out of office till today and I don’t have the required information to answer you till I will come back to work.
I will come back to my office at Sunday, then I will have the ability to provide you all the required answers.

Thanks again!

orong13 · September 2, 2018, 10:04am

Hello,
The Tensorflow pb (which was converted to the uff file) was generated under the following platform:
PC#1:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type – GTX-1070TI
nvidia driver version – 384.130
CUDA version – 8.0.44
CUDNN version – 6.0.21
Python version – 3.5.2
Tensorflow version – 1.4.1
TensorRT version – Not used

The TensorRT uff was generated under the following platform:
PC#2:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – 3.5.2
Tensorflow version – 1.8
TensorRT version – 4.0.1.6

The TensorRT CUDA engine serialized file was generated under the following platforms:

Only C++ - No Python usage at all.
Only TensorRT - No Tensorflow usage at all

PC#2:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – not used
Tensorflow version – Not used
TensorRT version – 4.0.1.6

Jetson TX2 developer kit board:
Linux distro and version –
Ubuntu 16.04.5 LTS (Xenial Xersus)
L4T -
#R28 (release), REVISION 2.1, GCID: 11272647, BOARD: t186ref, EABI: aarch64, DATE: Thu May 17 07:29:06 UTC 2018
GPU type -
As part of the Jetson TX2 developer kit board
JetPack –
3.2.1 (But TensorRT and CUDNN were updated according to JetPack 3.3 versions)
nvidia driver version - As part of the JetPack
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.5
Python version – Not used
Tensorflow version – Not used
TensorRT version – 4.0.1.6

The TensorRT uff was generated using these Python commands:

Import uff
Uff_model = uff.from_tensorflow_frozen_model(“My pb file path”, “List of my graph outputs nodes names”, text = “My uff text file path”, list_modes=False,  output_filename=”My uff binary file path”)

The TensorRT CUDA engine serialized file was generated using these C++ commands:

/*Objects and types declarations:*/
nvuffparser::IUffParser *m_parser;
nvinfer1::IBuilder *m_builder;
nvinfer1::INetworkDefinition *m_network;
nvinfer1::ICudaEngine *m_engine;
nvinfer1::IExecutionContext *m_context;

m_parser = nvuffparser::createUffParser();

for (auto& tensorIntput : m_tensorsInputs)
{
                m_parser->registerInput(tensorIntput->m_name.c_str(),
                                                                nvinfer1::DimsCHW(tensorIntput->m_dimC,
                                                                                                      tensorIntput->m_dimH,
                                                                                                      tensorIntput->m_dimW),
                                                                                                      nvuffparser::UffInputOrder::kNCHW);
}

for (auto& tensorOutput : m_tensorOutputsNames)
{
                m_parser->registerOutput(tensorOutput.c_str());
}                              

m_builder = nvinfer1::createInferBuilder(m_tRTLogger);
                                
m_builder->setMaxBatchSize(m_maxBatchSize);

m_builder->setMaxWorkspaceSize(MAX_WORKSPACE);

m_network = m_builder->createNetwork();
                                
if (!m_parser->parse(m_uffFilePath.c_str(), *m_network, nvinfer1::DataType::kFLOAT))
{
                m_tRTLogger.log(nvinfer1::ILogger::Severity::kERROR, std::string("Fail to parse").c_str());
                throw std::runtime_error(std::string("CUDA engine serialization operation error!!!"));                   
}
                                                
/*After the network has been defined, build the engine by configuring the builder*/
m_engine = m_builder->buildCudaEngine(*m_network);

/* we can clean the network and the parser */
m_network->destroy();
m_builder->destroy();

nvinfer1::IHostMemory *engineSerialized = m_engine->serialize();

std::ofstream engineSerializedFile;

				/*Open a new file for holding the serialized engine data*/
				engineSerializedFile.open(engineSerializedFilePath.string(), std::ios::out | std::ios::binary);

				if (engineSerializedFile.is_open() && engineSerializedFile.good() && !engineSerializedFile.fail())
				{
					/*Save the serialized engine data into the file*/
					engineSerializedFile.write(reinterpret_cast<const char *>(engineSerialized->data()), engineSerialized->size());

					/*Close the file*/
					engineSerializedFile.close();
				}

Thanks,

NVES · September 5, 2018, 5:29pm

Hello orong13,

In general, the CUDA engine produced by a given INetworkDefinition by a TensorRT builder may depend on various system factors (GPU, OS/kernel, CPU, system load, available memory, etc.) that affect layer implementation availability and timing during the process of building an engine.

The serialization of the engine is dependent on the layer implementations chosen for that engine and therefore is also dependent on these system factors. It is therefore not unexpected behavior if a given INetworkDefinition produces different engines which have different serializations over multiple runs. If the same engine in TensorRT produced different serializations over multiple runs, that would be unexpected behavior.

orong13 · September 7, 2018, 8:16am

Thank you for your clarification.
I understand that there are several system factors which can impact on the generated CUDA engine but when the CUDA engine was generated it cannot be serialized differently.

Last questions regarding this issue:

The maximum size gap that I got was more than ~500 MBs. First I gor a serialized file which had ~600MBs and secondly I got less than 100 ~MBs. Both of them executed properly when I de-serialized them. Despite the fact that the generated CUAD engines will be different (due to these system factors).
Is it an expected size gap?
Are these system factors have so much impact?
Is it means that my graph size and layers were degraded so much and still executed correctly?
Do I have any control on these system factors or do you have any recommendations what shall be the best system factors states in order to generate the best CUDA engine (From performances and detections quality)?

Topic		Replies	Views
TensorRT doesn't perform properly the Tensorflow concat and\or reshape commands TensorRT	8	3586	December 27, 2018
TensorRT added layer before output TensorRT	3	780	February 13, 2020
Different TensorRT inference results from the same input when batchSize > 1 TensorRT	2	2028	October 12, 2021
paring the UFF network with the uffparser and making an engine with build_cuda_engine fails TensorRT	3	1143	January 30, 2019
Problem with custom layers and Python UFF parser in TensorRT 3.0 RC Jetson TX2	41	7735	October 18, 2021
model accuracy penalty with tensorRT on jetson TX2 Jetson TX2	7	635	October 18, 2021
TensorRT in PX2 (4.0.0.8) behaves wrong about plugin layers (std::out_of_range) TensorRT	10	1197	December 21, 2018
Low Compute utilization of converted TensorFlow model during inference Jetson TX2	19	1695	October 18, 2021
how to import uff model from a UFF File Jetson TX2	15	3415	October 18, 2021
Tensorflow RNN UFF conversion not yet supported? TensorRT	8	1265	October 12, 2021

TensorRT ICudaEngine serialization size is different for the same uff file INetworkDefinition

Related topics