Description
I’m trying to build an engine from an Onnx network which inputs are of dynamic shapes. After creating an optimization profile and specifying the min, opt, and max shapes the built engine doesn’t incorporate this profile.
Environment
TensorRT Version: 8.4.1.5
GPU Type: V100
Nvidia Driver Version: 460
CUDA Version: 11.3
CUDNN Version:
Operating System + Version: Ubuntu 16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):
Relevant Files
Here is the Onnx model link. m2m100_418M-init-decoder.onnx - Google Drive
Steps To Reproduce
Below is the code to reproduce.
from pathlib import Path
import tensorrt as trt
from translate.quantization.tensorrt import common
_folder = Path.cwd()
saved_models_path = _folder.joinpath("tensorrt_models")
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
def build_and_save_engine(onnx_file_path):
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(common.EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
config = builder.create_builder_config()
profile = builder.create_optimization_profile()
with open(onnx_file_path, 'rb') as model:
if not parser.parse(model.read()):
print("ERROR: Failed to parse the ONNX file.")
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
print('Completed parsing of ONNX file')
print('Beginning ONNX file parsing')
print("Network inputs:")
for i in range(network.num_inputs):
tensor = network.get_input(i)
print(tensor.name, trt.nptype(tensor.dtype), tensor.shape)
# allow TensorRT to use up to 16GB of GPU memory for tactic selection
config.max_workspace_size = common.GiB(16)
builder.max_batch_size = 1
# set optimization profile for dynamic shape
min_length, max_length = 1, 200
opt_length = int(max_length / 2)
min_batch_size, max_batch_size = 1, 16
num_beams = 1
dim = 1024
profile.set_shape('input_ids', (min_batch_size * num_beams, 1),
(min_batch_size * num_beams, 1),
(min_batch_size * num_beams, 1))
profile.set_shape('encoder_attention_mask', (min_batch_size * num_beams, min_length),
(min_batch_size * num_beams, opt_length),
(min_batch_size * num_beams, max_length))
profile.set_shape('encoder_hidden_states', (min_batch_size * num_beams, min_length, dim),
(min_batch_size * num_beams, opt_length, dim),
(min_batch_size * num_beams, max_length, dim))
config.add_optimization_profile(profile)
engine = builder.build_engine(network, config)
context = engine.create_execution_context()
print(context.get_binding_shape(0))
print(context.get_binding_shape(1))
print(context.get_binding_shape(2))
if __name__ == "__main__":
build_and_save_engine("m2m100_418M-init-decoder.onnx")
It’s supposed to print the shapes of 3 inputs as
(1, -1)
(1, -1)
(1, -1, 1024)
since the second dimension is dynamic but it prints
(1, 1)
(1, 1)
(1, 1, 1024)
Can someone please check if there’s any problem with my implementation or if it’s indeed a bug? Thank you.