When converting BERT onnx to TensorRT engine, get different num_layers

lyzs1225 · January 28, 2021, 7:24am

Description

The code for convertion is:

def build_engine(model_file, max_ws=512 * 1024 * 1024, fp16=True):
    print("building engine")
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(TRT_LOGGER)
    builder.fp16_mode = fp16
    config = builder.create_builder_config()
    config.max_workspace_size = max_ws
    if fp16:
        config.flags |= 1 << int(trt.BuilderFlag.FP16)
    explicit_batch = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(explicit_batch)
    with trt.OnnxParser(network, TRT_LOGGER) as parser:
        with open(model_file, 'rb') as model:
            parsed = parser.parse(model.read())
            print("network.num_layers", network.num_layers)
            last_layer = network.get_layer(network.num_layers - 1)
            #network.mark_output(last_layer.get_output(0))
            engine = builder.build_engine(network, config=config)
    return engine

The output of the num_layer is:
print("network.num_layers", network.num_layers)

But the engine shows only 579 num_layers:

And when I tried to infer with the engine, the result was also different from the pytorch model result.(pytorch result is the same as the onnx runtime result and both are correct)

The “trtexec” command

trtexec --explicitBatch --onnx=bert_batch_1_sim.onnx --saveEngine=bert.engine

gave the same result as the “build_engine” function

More information:

trtexec warning logs:

some information about the engine(get from trtexec). The information looks good but the inference result is wrong.

Environment

TensorRT Version: 7.0
GPU Type: Tesla T4
Nvidia Driver Version: 410.104
CUDA Version: 10.2
CUDNN Version: 7.6.5.32
Operating System + Version: Linux(Docker container)
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.03-py3

Relevant Files

ONNX file

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · January 28, 2021, 7:37am

Hi, Request you to share the ONNX model and the script so that we can assist you better.

Alongside you can try validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

lyzs1225 · January 28, 2021, 7:51am

Thanks, I’m uploading my onnx converting code along with my onnx file and tensorrt engine file right now

lyzs1225 · January 28, 2021, 8:10am

I have uploaded my onnx file here:
ONNX file

I have tried the trtexec command but still got 579 num_layers and wrong inference result.

My inference code is from here:
[tensorrt-utils/infer.py at 493aa3827ff2c9886436ee4cbe60fed79d5bd263 · rmccorm4/tensorrt-utils · GitHub]

Also, the pytorch model and onnx model outputs are the same below :

However, the tensorRT engine output is:

Thanks!

lyzs1225 · January 29, 2021, 3:46am

Hi, I tried check_model.py but get no output

spolisetty · January 29, 2021, 10:04am

Hi @lyzs1225,

We recommend you to try latest tensorrt 7.2.x.
We also have a tool polygraphy to do the comparsion between trt and onnxruntime. This might help you.

Thank you.

lyzs1225 · February 1, 2021, 4:09am

Hi @spolisetty ,
I tried tensorRT 7.2 in container: nvcr.io/nvidia/tensorrt:20.12-py3 but still got the wrong answer, could you help me to determine why it happened?
The onnx file is the same file.

And how to get the camparison tool?