Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension

Description

I have a model can be successfully run tenorflow-serving. Then I covert it with commond saved_model_cli, below is detail command line:

docker run --rm --user 3004 --gpus all -it \
    -v /path/to/tensorflow_serving:/work/tf_model \
    -e CUDA_VISIBLE_DEVICES=1 \
    harbor.private.com/dev/tf:1.15.5-gpu /usr/local/bin/saved_model_cli convert \
    --dir /work/tf_model/buyer_sent_model_pb_02/01 \
    --output_dir /work/tf_model/buyer_sent_model_trt/02 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 16 --is_dynamic_op True

Then I serve it with tensorflow-serving, command line:

docker run -d --gpus all -p 8501:8501 --mount type=bind,source=/path/to/tensorflow_serving/my_model_dir,target=/models/my_model_dir \
-e MODEL_NAME=my_model_name -e CUDA_VISIBLE_DEVICES=1 \
-e TF_FORCE_GPU_ALLOW_GROWTH='true' \
-t harbor.private.com/dev/tf-serving:2.4.1-gpu

my input:

{
    "inputs": {
             "Input-Token": data1,
             "Input-Segment": data2
        }
}

data1 and data2 are both lists, length is 16.

data1:

[
    [101, 3766, 752, 8024, 6814, 3341, 6760, 6760, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 8238, 697, 1259, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 6206, 743, 2643, 5948, 1947, 6163, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 930, 702, 6963, 3221, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 2769, 6821, 6804, 791, 1921, 1157, 2802, 2458, 3341, 4500, 749, 671, 833, 6230, 2533, 679, 1916, 3265, 102, 0, 0, 0],
    [101, 2769, 3221, 6206, 2864, 4706, 5296, 3890, 5011, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1962, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1355, 749, 1557, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6843, 3819, 4706, 3344, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 872, 1962, 6435, 7309, 6821, 702, 743, 671, 6843, 671, 3221, 2582, 720, 702, 6843, 3791, 102, 0, 0, 0, 0, 0, 0, 0],
    [101, 1119, 3247, 2458, 1993, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6716, 7770, 8725, 8175, 1408, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 743, 749, 6821, 702, 121, 119, 8146, 4638, 4385, 1762, 4684, 2970, 4802, 6371, 3119, 6573, 2218, 1377, 809, 749, 511, 1968, 102],
    [101, 4692, 1168, 928, 2622, 1726, 1908, 678, 1521, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 155, 4772, 7027, 7481, 1377, 809, 3022, 679, 6585, 6716, 4638, 3688, 6132, 720, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2571, 6853, 4157, 3766, 3300, 2571, 6853, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

data2:

[
    [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
    [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
]

This set of data works fine.

But when change data2 to:

[
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

This set of data1 and data2 run into troubles.

On the server side it has log as below:

2021-07-01 08:29:53.363285: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.734463: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.863914: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,30,768], [1,30,768]]
2021-07-01 08:29:58.984471: W 

on the client, I got:

{'error': 'Timed out waiting for notification'}

It seems tensorflow compresses the data2 from a list of 16 length to 1 length?

What is the problem in my case, do I miss someting?

Environment

tensorflow:1.15.5-gpu for convert
tensorflow-serving: 2.4.1-gpu for serving
both docker is pulled from offical site in docker hub

TensorRT Version:
GPU Type: 2080ti, both convert and serving
Nvidia Driver Version: 455.38 in Host
CUDA Version: convert: 10.0, in docker,
CUDNN Version: convert: 7.6.2, in docker
Operating System + Version: centos7
Python Version (if applicable): 3.6.9 in tf-1.15.5-gpu, when convert
TensorFlow Version (if applicable): convert: 1.15.5-gpu, serving: 2.4.1-gpu
Container (if container which image + tag): convert: tensorflow:1.15.5-gpu, serving: tensorflow-serving:2.4.1-gpu

Hi @xut08,

We recommend you to post your concern on Tensorflow related platform.

If you’re interested you can also try on Tensorflow NGC container.
https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

Thank you.

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

Thank you for your repley.
In fact I’m trying to convert a tensorflow model to a TensorRT engines as is mentioned in Quick Start Guide :: NVIDIA Deep Learning TensorRT Documentation with saved_model_cli. I thought it was a higher wrapped script for tf-trt.
From your reply, it more likely that tensorflow-serving caused my problem, not tf-trt?

@xut08,

Sorry for not making it clear in my previous reply, could you please confirm are you facing this issue when you use tf-trt only or without using tf-trt as well ? it would be helpful to isolate, if problem is with tensorflow model.

Thank you.

OK, I guess the confusing part is the script I use to covert tensorflow model. In most case, we call TF-TFT api to convert a tf model to onnx, but I used saved_model_cli, a script provided by tensorflow official. Here is my step:

  1. I have a tensorflow model end with .pb format, it can be run by tensorflow-serving.
  2. I convert the tensorflow model with saved_model_cli, below is the log:
    tf2tensorrt.log (172.9 KB)
    The coverted model is still end with .pb, and can be ran by tensorflow-serving. I quite sure it modifies the model with tensorRT, but not sure if it is done with the standed TF-TFT api, maybe I’m misleaded by web as someone says saved_model_cli calls TF-TFT.
  3. I ran the converted model in step2 with tensorflow-serving and ran into the problem as is described above.

Perhaps the problem is more likely caused by tensorflow?

Thank you again for your patience.

@xut08,

Thank you for sharing the details. saved_model_cli does not use TF-TRT currently. We recommend you to please use TF-TRT api directly. For your reference, Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation

Thank you for replying.
In fact, save_model_cli do use TF-TRT. Code from docker docker pull tensorflow/tensorflow:1.15.5-gpu

Below is the source code of save_model_cli:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from tensorflow.python.tools.saved_model_cli import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

And in tensorflow.python.tools.saved_model_cli, we can find function convert_with_tensorrt


def convert_with_tensorrt(args):
  """Function triggered by 'convert tensorrt' command.

  Args:
    args: A namespace parsed from command line.
  """
  # Import here instead of at top, because this will crash if TensorRT is
  # not installed
  from tensorflow.contrib import tensorrt  # pylint: disable=g-import-not-at-top
  tensorrt.create_inference_graph(
      None,
      None,
      max_batch_size=args.max_batch_size,
      max_workspace_size_bytes=args.max_workspace_size_bytes,
      precision_mode=args.precision_mode,
      minimum_segment_size=args.minimum_segment_size,
      is_dynamic_op=args.is_dynamic_op,
      input_saved_model_dir=args.dir,
      input_saved_model_tags=args.tag_set.split(','),
      output_saved_model_dir=args.output_dir)

In dist-packages/tensorflow_core/contrib/tensorrt/python/__init__.py,

"""Exposes the python wrapper for TensorRT graph transforms."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# pylint: disable=unused-import,line-too-long
from tensorflow.contrib.tensorrt.python.trt_convert import create_inference_graph
# pylint: enable=unused-import,line-too-long

But I agree that we should use from tensorflow.python.compiler.tensorrt import trt_convert as trt, it’s more clear and controllable. And I have already tried it out:

import sys
from tensorflow.python.compiler.tensorrt import trt_convert as trt


DEFAULT_TRT_MAX_WORKSPACE_SIZE_BYTES = 1 << 30
input_dir = sys.argv[1]
output_dir = sys.argv[2]

converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_dir,
    is_dynamic_op=True,
    max_batch_size=16,
    maximum_cached_engines=128
)
converter.convert()
# converter.build(input_fn=my_input_fn)
converter.save(output_dir)

converter.save goes fine, but when I sevre this model with tenflow-serving, with a input batch in 16, still got

# on server side
2021-07-06 11:13:48.730329: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at trt_engine_op.cc:492 : Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]
# on client
{'error': '2 root error(s) found.\n  (0) Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]\n\t [[{{node TRTEngineOp_26}}]]\n  (1) Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]\n\t [[{{node TRTEngineOp_26}}]]\n\t [[dense_73/Softmax/_7]]\n0 successful operations.\n0 derived errors ignored.'}

We use kerasbert4keras and tf.keras to build this model, so I guess there might be a compatibility problem in constructing the model. We are trying to recontruct this model and train again, hope this may help to solve the problem.

Hi @xut08,

Are you still facing the issue.