Different inference time when loading engine from serialized file

I’m measuring inference times for MobilenetV2 model.

I’m doing two different processes:

1 - I build a TRT-engine from ONNX file and run inference directly with that engine.
2 - I build a TRT-engine from ONNX file, serialize that engine and save it into a file. Then, I deserialize the engine from that file and run inference.

I found that when I run inference from the serialized engine, inference times are much slower than when running without saving and loading the engine.

Function for building the engine is:

def build_engine_onnx(model_file):
** with trt.Builder(TRT_LOGGER) as builder, builder.create_network(common.EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:**
** builder.max_workspace_size = common.GiB(1)**
** # Load the Onnx model and parse it in order to populate the TensorRT network.**
** with open(model_file, ‘rb’) as model:**
** if not parser.parse(model.read()):**
** print (‘ERROR: Failed to parse the ONNX file.’)**
** for error in range(parser.num_errors):**
** print (parser.get_error(error))**
** return None**
** network_inputs = [network.get_input(i) for i in range(network.num_inputs)]**
** input_names = [_input.name for _input in network_inputs]**
** config = builder.create_builder_config()**
** profile = builder.create_optimization_profile()**
** profile.set_shape(‘input_2:0’, (1, 224, 224, 3),(1, 224, 224, 3),(1, 224, 224, 3))**
** config.add_optimization_profile(profile)**
** return builder.build_engine(network,config)**

Function for saving engine:

def save_engine(engine, file_name):
** buf = engine.serialize()**
** with open(file_name, ‘wb’) as f:**
** f.write(buf)**

Function for loading engine:

def load_engine(trt_runtime, plan_path):
** with open(plan_path, ‘rb’) as f:**
** engine_data = f.read()**
** engine = trt_runtime.deserialize_cuda_engine(engine_data)**
** return engine**

The function I’m using for inference is the same in both cases.

Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.


When loading and running the model with trtexec, inference time is lowest one, so I guess the TRT engine has no problem. Maybe I am not loading it ok?

The model and scripts:

common.py (8.3 KB) onnx_mobilenetV2_load.py (4.8 KB) mobilenetV2_fp32.plan (13.8 MB)
class_labels.txt (11.2 KB)

Hi @ebasarte,

Sorry for the delayed response. Code looks right to us.
We recommend you to please share ONNX model and steps to reproduce the issue for better assistance.

The code to build the engine is the one here:
PythonONNX_build.zip (13.9 MB)

ONNX model is there as well, in the data folder.

You just need to do: python3 onnx_mobilenetV2_build.py
It runs inference and serializes the TRT engine, called mobilenetV2_fp32.plan

Results are:
25 images were processed in 0.4767234290000033 seconds
Average time for image preprocessing is 0.00822811083999909 seconds
Average time for image inferece is 0.007421662240002434 seconds

If you use that serialized engine with the following code, you’ll see that inference times are not equal to the ones obtained when building the engine.

To run the code you just need to do: python3 onnx_mobilenetV2_load.py
PythonONNX_load.zip (13.9 MB)

Results are:
25 images were processed in 2.2262240489999954 seconds
Average time for image preprocessing is 0.016043223000003763 seconds
Average time for image inferece is 0.022590343320000556 seconds

Hi @ebasarte,

Sorry for the delayed response, we tried running scripts you’ve shared.
But facing following error. Looks like somewhere you’ve given absolute path. We request you to please update with relative path and share. We tried modifying it but couldn’t find.

File “/test/inf_time_issue_176870/PythonONNX_build/common.py”, line 137, in locate_files
raise FileNotFoundError(“Could not find {:}. Searched in data paths: {:}”.format(filename, data_paths))
FileNotFoundError: Could not find 1149224971_ed5180a58a.jpg. Searched in data paths: [’/home/naitec/Test/PythonONNX_build/data’]

Thank you.

Here are the files:

PythonONNX_build.zip (13.9 MB)
PythonONNX_load.zip (13.9 MB)

Hi @ebasarte,

We could run successfully scripts you’ve shared. But observed little difference in inference time.
onnx_mobilenetV2_build.py - 0.6956509780138731 seconds
onnx_mobilenetV2_load.py - 0.9290754743851721 seconds

Could you please share with us following environment details you’re using.

TensorRT Version :
GPU Type :
Nvidia Driver Version :
CUDA Version :
CUDNN Version :
Operating System + Version :
Python Version (if applicable) :
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Thank you.

TensorRT Version : 7.1.3
GPU Type : Volta (Jetson XavierNX)
Nvidia Driver Version : The one that come with JetPack
CUDA Version : 10.2.89
CUDNN Version : 8.0.0
Operating System + Version : Ubuntu 18.04 Bionic
Python Version (if applicable) : 3.6.9
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

Hi @ebasarte,

Could you please do a few warm-up runs of common.do_inference_v2() before starting the timing. The very first run usually takes a long time in setting up.

If possible, we request you to please collect the Nsight Systems profile so that we have a better look.

Thank you.

I did the inference with 500 pictures and the results for the last ones are:

Image preprocessed in 0.005385305000004337 seconds
Image inference took in 0.011298801000066305 seconds
*Test image 490:
2.spaghetti squash:0.004367718007415533
4.neck brace:0.0017375056631863117
5.butternut squash:0.001498603611253202
Image preprocessed in 0.006695319000073141 seconds
Image inference took in 0.011397712999951182 seconds
*Test image 491:
5.military uniform:0.014318372122943401
Image preprocessed in 0.005846583999982613 seconds
Image inference took in 0.011406609000005119 seconds
*Test image 492:
1.cabbage butterfly:0.7592569589614868
4.sulphur butterfly:0.013447572477161884
Image preprocessed in 0.006242294999992737 seconds
Image inference took in 0.011473617000092418 seconds
*Test image 493:
Image preprocessed in 0.005618680999987191 seconds
Image inference took in 0.011412945000074615 seconds
*Test image 494:
2.military uniform:0.13058039546012878
3.bow tie:0.057123467326164246
4.academic gown:0.04148273169994354
Image preprocessed in 0.009077140000044892 seconds
Image inference took in 0.011408593000055589 seconds
*Test image 495:
5.tiger cat:0.022765250876545906
Image preprocessed in 0.006300503000034041 seconds
Image inference took in 0.011421616999996331 seconds
*Test image 496:
1.cabbage butterfly:0.14264458417892456
5.sulphur butterfly:0.05314701050519943
Image preprocessed in 0.00787512500005505 seconds
Image inference took in 0.011428112999965379 seconds
*Test image 497:
1.bee eater:0.09392005950212479
4.picket fence:0.05556768178939819
Image preprocessed in 0.005983192000030613 seconds
Image inference took in 0.011348112999939985 seconds
*Test image 498:
1.Labrador retriever:0.7874717712402344
2.German short-haired pointer:0.04523288458585739
3.Chesapeake Bay retriever:0.02505187690258026
4.flat-coated retriever:0.013429956510663033
5.Great Dane:0.012554347515106201
Image preprocessed in 0.0061854960000573556 seconds
Image inference took in 0.011384176999968076 seconds
*Test image 499:
1.military uniform:0.4383077025413513
3.scuba diver:0.03753417730331421
Image preprocessed in 0.006677591999959986 seconds
Image inference took in 0.01129640199997084 seconds
*Test image 500:
1.military uniform:0.8591780662536621
2.assault rifle:0.022453853860497475
500 images were processed in 12.753411824999944 seconds
Average time for image preprocessing is 0.006524533127747839 seconds
Average time for image inferece is 0.013333324656687292 seconds

Last inference times are 0.011, lower than the ones I got at first, but still a bit higher than the ones when I build the engine.

I cannot collect the Nsight Systems profile, I’m not working with that tool.

Hi @ebasarte,

Thank you for sharing the output. From above reply its not so clear, could you please confirm is performance (processing time) improved after doing warm-up runs of common.do_inference_v2(). Please share us following output for another case as well (build the engine, inference directly).
500 images were processed in 12.753411824999944 seconds

Then we can compare how much difference is there, it will be helpful for better assistance.

The output when building and inferencing directly:

500 images were processed in 8.108642259000021 seconds
Average time for image preprocessing is 0.00623156832199993 seconds
Average time for image inferece is 0.0073180870059989045 seconds

The output when loading the serialized engine:

500 images were processed in 15.97795191299997 seconds
Average time for image preprocessing is 0.009737948125752256 seconds
Average time for image inferece is 0.01302419243512679 seconds

So it looks like like processing time didn’t improve after doing warm-up.

Thank you @ebasarte for sharing the details. Please allow us some time to work on this issue.