Different inference time when loading engine from serialized file

I’m measuring inference times for MobilenetV2 model.

I’m doing two different processes:

1 - I build a TRT-engine from ONNX file and run inference directly with that engine.
2 - I build a TRT-engine from ONNX file, serialize that engine and save it into a file. Then, I deserialize the engine from that file and run inference.

I found that when I run inference from the serialized engine, inference times are much slower than when running without saving and loading the engine.

Function for building the engine is:

def build_engine_onnx(model_file):
** with trt.Builder(TRT_LOGGER) as builder, builder.create_network(common.EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:**
** builder.max_workspace_size = common.GiB(1)**
** # Load the Onnx model and parse it in order to populate the TensorRT network.**
** with open(model_file, ‘rb’) as model:**
** if not parser.parse(model.read()):**
** print (‘ERROR: Failed to parse the ONNX file.’)**
** for error in range(parser.num_errors):**
** print (parser.get_error(error))**
** return None**
** network_inputs = [network.get_input(i) for i in range(network.num_inputs)]**
** input_names = [_input.name for _input in network_inputs]**
** config = builder.create_builder_config()**
** profile = builder.create_optimization_profile()**
** profile.set_shape(‘input_2:0’, (1, 224, 224, 3),(1, 224, 224, 3),(1, 224, 224, 3))**
** config.add_optimization_profile(profile)**
** return builder.build_engine(network,config)**

Function for saving engine:

def save_engine(engine, file_name):
** buf = engine.serialize()**
** with open(file_name, ‘wb’) as f:**
** f.write(buf)**

Function for loading engine:

def load_engine(trt_runtime, plan_path):
** with open(plan_path, ‘rb’) as f:**
** engine_data = f.read()**
** engine = trt_runtime.deserialize_cuda_engine(engine_data)**
** return engine**

The function I’m using for inference is the same in both cases.

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

When loading and running the model with trtexec, inference time is lowest one, so I guess the TRT engine has no problem. Maybe I am not loading it ok?

The model and scripts:

common.py (8.3 KB) onnx_mobilenetV2_load.py (4.8 KB) mobilenetV2_fp32.plan (13.8 MB)
class_labels.txt (11.2 KB)