When converting BERT onnx to TensorRT engine, get different num_layers


The code for convertion is:

def build_engine(model_file, max_ws=512 * 1024 * 1024, fp16=True):
    print("building engine")
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(TRT_LOGGER)
    builder.fp16_mode = fp16
    config = builder.create_builder_config()
    config.max_workspace_size = max_ws
    if fp16:
        config.flags |= 1 << int(trt.BuilderFlag.FP16)
    explicit_batch = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    network = builder.create_network(explicit_batch)
    with trt.OnnxParser(network, TRT_LOGGER) as parser:
        with open(model_file, 'rb') as model:
            parsed = parser.parse(model.read())
            print("network.num_layers", network.num_layers)
            last_layer = network.get_layer(network.num_layers - 1)
            engine = builder.build_engine(network, config=config)
    return engine

The output of the num_layer is:
print("network.num_layers", network.num_layers)
But the engine shows only 579 num_layers:

And when I tried to infer with the engine, the result was also different from the pytorch model result.(pytorch result is the same as the onnx runtime result and both are correct)

The “trtexec” command

trtexec --explicitBatch --onnx=bert_batch_1_sim.onnx --saveEngine=bert.engine

gave the same result as the “build_engine” function

More information:

trtexec warning logs:

some information about the engine(get from trtexec). The information looks good but the inference result is wrong.


TensorRT Version: 7.0
GPU Type: Tesla T4
Nvidia Driver Version: 410.104
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: Linux(Docker container)
Python Version (if applicable): 3.6
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:20.03-py3

Relevant Files

ONNX file

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi, Request you to share the ONNX model and the script so that we can assist you better.

Alongside you can try validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)

Alternatively, you can try running your model with trtexec command.


Thanks, I’m uploading my onnx converting code along with my onnx file and tensorrt engine file right now

I have uploaded my onnx file here:
ONNX file

I have tried the trtexec command but still got 579 num_layers and wrong inference result.

My inference code is from here:
[tensorrt-utils/infer.py at 493aa3827ff2c9886436ee4cbe60fed79d5bd263 · rmccorm4/tensorrt-utils · GitHub]

Also, the pytorch model and onnx model outputs are the same below :
However, the tensorRT engine output is:


Hi, I tried check_model.py but get no output

Hi @lyzs1225,

We recommend you to try latest tensorrt 7.2.x.
We also have a tool polygraphy to do the comparsion between trt and onnxruntime. This might help you.

Thank you.

Hi @spolisetty ,
I tried tensorRT 7.2 in container: nvcr.io/nvidia/tensorrt:20.12-py3 but still got the wrong answer, could you help me to determine why it happened?
The onnx file is the same file.

And how to get the camparison tool?

We also have a tool polygraphy to do the comparsion between trt and onnxruntime.

Thank you!

Hi @spolisetty ,
Any progress on how to fix this problem?

Hi @lyzs1225,

Please try polygraphy tool for comparsion between trt and onnxruntime. Which helps you for debugging. For your reference,

Thank you.

Hi @spolisetty ,

The polygraphy tool shows the same wrong results as I got before.

Hi @spolisetty ,
Could you help me to find why the trt inference get different answer?

Hi @lyzs1225,

Please allow us sometime. We are looking into this issue.


Hi @spolisetty ,
Any progress on this?