Built engine failed to include optimization profile with dynamic input shapes #2166


I’m trying to build an engine from an Onnx network which inputs are of dynamic shapes. After creating an optimization profile and specifying the min, opt, and max shapes the built engine doesn’t incorporate this profile.


TensorRT Version:
GPU Type: V100
Nvidia Driver Version: 460
CUDA Version: 11.3
CUDNN Version:
Operating System + Version: Ubuntu 16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):

Relevant Files

Here is the Onnx model link. m2m100_418M-init-decoder.onnx - Google Drive

Steps To Reproduce

Below is the code to reproduce.

from pathlib import Path
import tensorrt as trt
from translate.quantization.tensorrt import common

_folder = Path.cwd()
saved_models_path = _folder.joinpath("tensorrt_models")
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

def build_and_save_engine(onnx_file_path):
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(common.EXPLICIT_BATCH)
    parser = trt.OnnxParser(network, TRT_LOGGER)
    config = builder.create_builder_config()
    profile = builder.create_optimization_profile()
    with open(onnx_file_path, 'rb') as model:
        if not parser.parse(model.read()):
            print("ERROR: Failed to parse the ONNX file.")
            for error in range(parser.num_errors):
            return None
        print('Completed parsing of ONNX file')

    print('Beginning ONNX file parsing')
    print("Network inputs:")
    for i in range(network.num_inputs):
        tensor = network.get_input(i)
        print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)
    # allow TensorRT to use up to 16GB of GPU memory for tactic selection
    config.max_workspace_size = common.GiB(16)
    builder.max_batch_size = 1

    # set optimization profile for dynamic shape
    min_length, max_length = 1, 200
    opt_length = int(max_length / 2)
    min_batch_size, max_batch_size = 1, 16
    num_beams = 1
    dim = 1024

    profile.set_shape('input_ids', (min_batch_size * num_beams, 1),
                      (min_batch_size * num_beams, 1),
                      (min_batch_size * num_beams, 1))
    profile.set_shape('encoder_attention_mask', (min_batch_size * num_beams, min_length),
                      (min_batch_size * num_beams, opt_length),
                      (min_batch_size * num_beams, max_length))
    profile.set_shape('encoder_hidden_states', (min_batch_size * num_beams, min_length, dim),
                      (min_batch_size * num_beams, opt_length, dim),
                      (min_batch_size * num_beams, max_length, dim))


    engine = builder.build_engine(network, config)
    context = engine.create_execution_context()

if __name__ == "__main__":

It’s supposed to print the shapes of 3 inputs as

(1, -1)
(1, -1)
(1, -1, 1024)

since the second dimension is dynamic but it prints

(1, 1)
(1, 1)
(1, 1, 1024)

Can someone please check if there’s any problem with my implementation or if it’s indeed a bug? Thank you.

Please refer to below links related custom plugin implementation and sample:

While IPluginV2 and IPluginV2Ext interfaces are still supported for backward compatibility with TensorRT 5.1 and 6.0.x respectively, however, we recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt or IPluginV2IOExt interfaces instead.


I’m sorry but how are these plugins supposed to solve the problem?



Both input_ids and encoder_attention_mask has shapes (batch, seq_length) . So you can’t set input_ids to have fixed seqLen 1 while setting ``encoder_attention_mask` to have dynamic seqLen.

If the two dimensions are going to be different, they should have different names. Otherwise, TRT thinks they correspond to the same number.

Thank you.