"Mismatch between allocated memory size and expected size of serialized engine." Occurs when using the terexec command

The following error occurs when using the trtexec command.

trtexec: engine.cpp:1094: bool nvinfer1::rt::Engine::deserialize(const void*, std::size_t, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*): Assertion `size >= bsize && "Mismatch between allocated memory size and expected size of serialized engine."' failed.
Aborted (core dumped)

The TensorRT engine used was generated in tlt2.0 using the following procedure.

・Export Model

!tlt-export detectnet_v2 \
-m $model_path/model.tlt \
-k $APIKEY \
-o $out_path/export_model_fp16.etlt \
--data_type fp16

・Generating an Engine

!tlt-converter -k $APIKEY \
- o output_cov/Sigmoid,output_bbox/BiasAdd \
-d 3,400,496 \
-e $engine_path/tensort_fp16.engine \
input_file $model_path/export_model_fp16.etlt

Environment

I use the following docker image from NCG.
nvcr.io/nvidia/tensorrt:19.03-py3
TensorRT Version: TensorRT 5.1.2

Sorry to trouble you. I appreciate it.

Hi,

This looks like a TAO Toolkit related issue. We will move this post to the TAO Toolkit forum.

Thanks!

1 Like

ok!!

trtexec is running with this command.
/usr/src/tensorrt/bin/trtexec --loadEngine=$engine_path/tensort_fp16.engine --fp16 --batch=1 --useSpinWait

What is the command line?

I have tried this command.
/usr/src/tensorrt/bin/trtexec --loadEngine=$engine_path/tensort_fp16.engine --fp16 --batch=1 --useSpinWait

At that time, this error occurs.

trtexec: engine.cpp:1094: bool nvinfer1::rt::Engine::deserialize(const void*, std::size_t, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*): Assertion `size >= bsize && "Mismatch between allocated memory size and expected size of serialized engine."' failed.
Aborted (core dumped)

The engine file used in this case was generated by TLT2.0.

Can you try --batch=16 to check if it works?

The same error occurs.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=$engine_path/tensort_fp16.engine --fp16 --batch=16 --useSpinWait
[I] loadEngine:$engine_path/tensort_fp16.engine
[I] fp16
[I] batch: 16
[I] useSpinWait
trtexec: engine.cpp:1094: bool nvinfer1::rt::Engine::deserialize(const void*, std::size_t, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*): Assertion `size >= bsize && "Mismatch between allocated memory size and expected size of serialized engine."' failed.
Aborted (core dumped)

May I know which docker are you using?

TLT2.0
nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

I am using this docker when using trtexec.
nvcr.io/nvidia/tensorrt:19.03-py3

Please add below and generate a new engine and retry.
-m 1

I have tried but get the same error.

trtexec: engine.cpp:1094: bool nvinfer1::rt::Engine::deserialize(const void*, std::size_t, nvinfer1::IGpuAllocator&, nvinfer1::IPluginFactory*): Assertion `size >= bsize && "Mismatch between allocated memory size and expected size of serialized engine."' failed.
Aborted (core dumped)

Please use the docker with the same Tensorrt version.
In nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3, its tensorrt version is 7.0.0 .

Thus, please use below docker instead. Its tensort version is also 7.0.0.
docker pull nvcr.io/nvidia/tensorrt:20.03-py3

See info in

Thank you very much.
It worked successfully.

By the way, the trtexec documentation says “It is useful for *benchmarking networks* on random or user-provided input data.
How can I use user-provided input data?

Can you provide the explicit link for the doc?

This document.

Usually you can ignore it.

If you still want to load the input data, you can find that there is an option(--loadInputs) in trtexec.

For example,
/usr/src/tensorrt/bin/trtexec --deploy=googlenet.prototxt --output=prob --int8 --loadInputs=data:googlenet.caffemodel --exportOutput=output_int8_1.txt

Thank you very much.

I have tried --loadInputs with the following reference, but it does not appear to be working.

First, I converted the image to a dat file with this code.
Also, the image resolution is 4k.

import PIL.Image
import numpy as np

im = PIL.Image.open('/images/01.jpg')
data = np.asarray(im, dtype=np.float32)
data.tofile('/images/01.dat')

Then I ran trtexec.
/usr/src/tensorrt/bin/trtexec --loadEngine=$engine_path/model.engine --fp16 --batch=1 --useSpinWait --loadInputs='Input_tensor_1:/images/01_P8260008.dat'

However, thethroughput is almost the same as if the input image were not specified.
I am thinking that throughput will be lower because I am inputting 4k images, am I doing something wrong?

Since original issue is already fixed, shall we close this topic? For your latest question about trtexec and “loadinputs” , please create a new topic in “TensorRT” forum for better help. Thanks a lot.

ok!
Thank you very much.