Description
I am trying to deploy a tensorrt plan model to my triton inference server to enable ‘allow_ragged_batch’ of the triton.
When allow_ragged_batch option is enabled, triton makes 2d tensor inputs to flatten 1d tensors(this input has a variational length).
So I use additional input “x_tst_lengths” that contains actual length of each requests to convert 1d tensors to 2d tensors with appropriate padding to max length in a batch.
This input “x_tst_lengths” has a dynamic shape [batch_size] which indicates the number of requests.
But when I described model in PyTorch and convert it to onnx and lastly to tensorrt plan with trtexec, this “x_tst_lengths” automatically becomes a shape input which cannot has a dynamic shape.
Can I prevent the conversion to shape input?
Relevent Error message:
[TensorRT] ERROR: x_tst_lengths: shape input must have build-time dimensions, has dimensions [-1]
[TensorRT] ERROR: Network validation failed.
source code:
import tensorrt as trt
TRT_LOGGER = trt.Logger()
TRT_LOGGER.min_severity = trt.Logger.Severity.VERBOSE
batch_sizes = [8]
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_batch_size = batch_sizes[-1]
with open('model.onnx', 'rb') as model:
if not parser.parse(model.read()):
print('parser.parse failed')
for error in range(parser.num_errors):
print(parser.get_error(error))
with builder.create_builder_config() as config:
config.max_workspace_size = 23000 * 1 << 20 #int((1 << 34) * 2.7)
for i in range(100):
print(i, network.get_layer(i).name, network.get_layer(i).type)
print(network.get_input(0).name, network.get_input(0).shape, network.get_input(0).is_shape_tensor)
print(network.get_input(1).name, network.get_input(1).shape, network.get_input(1).is_shape_tensor)
print(network.get_output(0).is_shape_tensor)
# for i in batch_sizes:
profile = builder.create_optimization_profile()
profile.set_shape('x_tst', (1,), (500,), (1000,))
profile.set_shape('x_tst_lengths', (1,), (8,), (8,))
profile.set_shape('length_scale', (1,), (500,), (1000,))
config.add_optimization_profile(profile)
print('num_optimization_profiles : ', config.num_optimization_profiles)
with builder.build_engine(network, config) as engine:
with open("model.plan", 'wb') as f:
f.write(engine.serialize())
Environment
TensorRT Version: 7.2.1 (ngc container 20.10)
GPU Type: RTX 2080Ti
Nvidia Driver Version: 455.38
CUDA Version: 11.1 (ngc container 20.10)
CUDNN Version: 8.0.4 (ngc container 20.10)
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.9 (onnx->tensorrt), 3.7.9(pytorch)
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.5.0
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered