Description
The code for convertion is:
def build_engine(model_file, max_ws=512 * 1024 * 1024, fp16=True):
print("building engine")
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
builder.fp16_mode = fp16
config = builder.create_builder_config()
config.max_workspace_size = max_ws
if fp16:
config.flags |= 1 << int(trt.BuilderFlag.FP16)
explicit_batch = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
network = builder.create_network(explicit_batch)
with trt.OnnxParser(network, TRT_LOGGER) as parser:
with open(model_file, 'rb') as model:
parsed = parser.parse(model.read())
print("network.num_layers", network.num_layers)
last_layer = network.get_layer(network.num_layers - 1)
#network.mark_output(last_layer.get_output(0))
engine = builder.build_engine(network, config=config)
return engine
The output of the num_layer is:
print("network.num_layers", network.num_layers)
But the engine shows only 579 num_layers:
And when I tried to infer with the engine, the result was also different from the pytorch model result.(pytorch result is the same as the onnx runtime result and both are correct)
The “trtexec” command
trtexec --explicitBatch --onnx=bert_batch_1_sim.onnx --saveEngine=bert.engine
gave the same result as the “build_engine” function
More information:
trtexec warning logs:
some information about the engine(get from trtexec). The information looks good but the inference result is wrong.
Environment
TensorRT Version: 7.0
GPU Type: Tesla T4
Nvidia Driver Version: 410.104
CUDA Version: 10.2
CUDNN Version: 7.6.5.32
Operating System + Version: Linux(Docker container)
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.03-py3
Relevant Files
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered