Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension

xut08 · July 1, 2021, 11:24am

Description

I have a model can be successfully run tenorflow-serving. Then I covert it with commond saved_model_cli， below is detail command line:

docker run --rm --user 3004 --gpus all -it \
    -v /path/to/tensorflow_serving:/work/tf_model \
    -e CUDA_VISIBLE_DEVICES=1 \
    harbor.private.com/dev/tf:1.15.5-gpu /usr/local/bin/saved_model_cli convert \
    --dir /work/tf_model/buyer_sent_model_pb_02/01 \
    --output_dir /work/tf_model/buyer_sent_model_trt/02 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 16 --is_dynamic_op True

Then I serve it with tensorflow-serving, command line:

docker run -d --gpus all -p 8501:8501 --mount type=bind,source=/path/to/tensorflow_serving/my_model_dir,target=/models/my_model_dir \
-e MODEL_NAME=my_model_name -e CUDA_VISIBLE_DEVICES=1 \
-e TF_FORCE_GPU_ALLOW_GROWTH='true' \
-t harbor.private.com/dev/tf-serving:2.4.1-gpu

my input:

{
    "inputs": {
             "Input-Token": data1,
             "Input-Segment": data2
        }
}

data1 and data2 are both lists, length is 16.

data1:

[
    [101, 3766, 752, 8024, 6814, 3341, 6760, 6760, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 8238, 697, 1259, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 6206, 743, 2643, 5948, 1947, 6163, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 930, 702, 6963, 3221, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2218, 3221, 2769, 6821, 6804, 791, 1921, 1157, 2802, 2458, 3341, 4500, 749, 671, 833, 6230, 2533, 679, 1916, 3265, 102, 0, 0, 0],
    [101, 2769, 3221, 6206, 2864, 4706, 5296, 3890, 5011, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1962, 4638, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 1355, 749, 1557, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6843, 3819, 4706, 3344, 1408, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 872, 1962, 6435, 7309, 6821, 702, 743, 671, 6843, 671, 3221, 2582, 720, 702, 6843, 3791, 102, 0, 0, 0, 0, 0, 0, 0],
    [101, 1119, 3247, 2458, 1993, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 6716, 7770, 8725, 8175, 1408, 8043, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2769, 743, 749, 6821, 702, 121, 119, 8146, 4638, 4385, 1762, 4684, 2970, 4802, 6371, 3119, 6573, 2218, 1377, 809, 749, 511, 1968, 102],
    [101, 4692, 1168, 928, 2622, 1726, 1908, 678, 1521, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 155, 4772, 7027, 7481, 1377, 809, 3022, 679, 6585, 6716, 4638, 3688, 6132, 720, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [101, 2571, 6853, 4157, 3766, 3300, 2571, 6853, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
]

data2:

[
    [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1],
    [1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
]

This set of data works fine.

But when change data2 to:

[
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

This set of data1 and data2 run into troubles.

On the server side it has log as below:

2021-07-01 08:29:53.363285: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.734463: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,25,768], [1,25,768]]
2021-07-01 08:29:58.863914: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:587] Running native segment forTRTEngineOp_26 due to failure in verifying input shapes: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,30,768], [1,30,768]]
2021-07-01 08:29:58.984471: W

on the client, I got:

{'error': 'Timed out waiting for notification'}

It seems tensorflow compresses the data2 from a list of 16 length to 1 length?

What is the problem in my case， do I miss someting?

Environment

tensorflow:1.15.5-gpu for convert
tensorflow-serving: 2.4.1-gpu for serving
both docker is pulled from offical site in docker hub

TensorRT Version:
GPU Type: 2080ti, both convert and serving
Nvidia Driver Version: 455.38 in Host
CUDA Version: convert: 10.0, in docker,
CUDNN Version: convert: 7.6.2, in docker
Operating System + Version: centos7
Python Version (if applicable): 3.6.9 in tf-1.15.5-gpu, when convert
TensorFlow Version (if applicable): convert: 1.15.5-gpu, serving: 2.4.1-gpu
Container (if container which image + tag): convert: tensorflow:1.15.5-gpu, serving: tensorflow-serving:2.4.1-gpu

spolisetty · July 1, 2021, 11:30am

Hi @xut08,

We recommend you to post your concern on Tensorflow related platform.

If you’re interested you can also try on Tensorflow NGC container.

Thank you.

NVES · July 1, 2021, 11:37am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

xut08 · July 1, 2021, 11:53am

Thank you for your repley.
In fact I’m trying to convert a tensorflow model to a TensorRT engines as is mentioned in Quick Start Guide :: NVIDIA Deep Learning TensorRT Documentation with saved_model_cli. I thought it was a higher wrapped script for tf-trt.
From your reply, it more likely that tensorflow-serving caused my problem, not tf-trt？

spolisetty · July 1, 2021, 12:14pm

@xut08,

Sorry for not making it clear in my previous reply, could you please confirm are you facing this issue when you use tf-trt only or without using tf-trt as well ? it would be helpful to isolate, if problem is with tensorflow model.

Thank you.

xut08 · July 1, 2021, 1:24pm

OK， I guess the confusing part is the script I use to covert tensorflow model. In most case, we call TF-TFT api to convert a tf model to onnx, but I used saved_model_cli, a script provided by tensorflow official. Here is my step:

I have a tensorflow model end with .pb format， it can be run by tensorflow-serving.
I convert the tensorflow model with saved_model_cli, below is the log:
tf2tensorrt.log (172.9 KB)
The coverted model is still end with .pb, and can be ran by tensorflow-serving. I quite sure it modifies the model with tensorRT, but not sure if it is done with the standed TF-TFT api, maybe I’m misleaded by web as someone says saved_model_cli calls TF-TFT.
I ran the converted model in step2 with tensorflow-serving and ran into the problem as is described above.

Perhaps the problem is more likely caused by tensorflow?

Thank you again for your patience.

spolisetty · July 7, 2021, 4:52am

@xut08,

Thank you for sharing the details. saved_model_cli does not use TF-TRT currently. We recommend you to please use TF-TRT api directly. For your reference, Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation

xut08 · July 7, 2021, 7:55am

Thank you for replying.
In fact, save_model_cli do use TF-TRT. Code from docker docker pull tensorflow/tensorflow:1.15.5-gpu

Below is the source code of save_model_cli:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from tensorflow.python.tools.saved_model_cli import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

And in tensorflow.python.tools.saved_model_cli， we can find function convert_with_tensorrt


def convert_with_tensorrt(args):
  """Function triggered by 'convert tensorrt' command.

  Args:
    args: A namespace parsed from command line.
  """
  # Import here instead of at top, because this will crash if TensorRT is
  # not installed
  from tensorflow.contrib import tensorrt  # pylint: disable=g-import-not-at-top
  tensorrt.create_inference_graph(
      None,
      None,
      max_batch_size=args.max_batch_size,
      max_workspace_size_bytes=args.max_workspace_size_bytes,
      precision_mode=args.precision_mode,
      minimum_segment_size=args.minimum_segment_size,
      is_dynamic_op=args.is_dynamic_op,
      input_saved_model_dir=args.dir,
      input_saved_model_tags=args.tag_set.split(','),
      output_saved_model_dir=args.output_dir)

In dist-packages/tensorflow_core/contrib/tensorrt/python/__init__.py,

"""Exposes the python wrapper for TensorRT graph transforms."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# pylint: disable=unused-import,line-too-long
from tensorflow.contrib.tensorrt.python.trt_convert import create_inference_graph
# pylint: enable=unused-import,line-too-long

But I agree that we should use from tensorflow.python.compiler.tensorrt import trt_convert as trt, it’s more clear and controllable. And I have already tried it out:

import sys
from tensorflow.python.compiler.tensorrt import trt_convert as trt


DEFAULT_TRT_MAX_WORKSPACE_SIZE_BYTES = 1 << 30
input_dir = sys.argv[1]
output_dir = sys.argv[2]

converter = trt.TrtGraphConverter(
    input_saved_model_dir=input_dir,
    is_dynamic_op=True,
    max_batch_size=16,
    maximum_cached_engines=128
)
converter.convert()
# converter.build(input_fn=my_input_fn)
converter.save(output_dir)

converter.save goes fine, but when I sevre this model with tenflow-serving， with a input batch in 16, still got

# on server side
2021-07-06 11:13:48.730329: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at trt_engine_op.cc:492 : Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]
# on client
{'error': '2 root error(s) found.\n  (0) Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]\n\t [[{{node TRTEngineOp_26}}]]\n  (1) Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_26: [[16,8,768], [1,8,768]]\n\t [[{{node TRTEngineOp_26}}]]\n\t [[dense_73/Softmax/_7]]\n0 successful operations.\n0 derived errors ignored.'}

We use keras， bert4keras and tf.keras to build this model, so I guess there might be a compatibility problem in constructing the model. We are trying to recontruct this model and train again, hope this may help to solve the problem.

spolisetty · July 11, 2021, 6:00pm

Hi @xut08,

Are you still facing the issue.

Topic		Replies	Views
Input shapes do not match input partial shapes stored in graph TensorRT cudnn	1	241	May 27, 2024
Assertion Error in buildMemGraph: 0 (mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size) TensorRT	10	1291	October 12, 2021
DeepStream, Tensorflow Model Zoo - Incompatibility DeepStream SDK	13	1491	October 12, 2021
ValueError: Node... Axis is not unique while converting tensorflow segmentation model to tensorrt TensorRT tensorrt , segmentation	3	1665	March 9, 2022
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2347	September 30, 2021
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	708	April 30, 2024
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2351	November 23, 2022
Issues with torch.nn.ReflectionPad2d(padding) conversion to TRT engine TensorRT tensorrt , pytorch , onnx	21	4165	February 8, 2022
Errors with reading pb file in TensorRT and readNetFromTensorflow in C++ TensorRT	3	1235	January 26, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1396	July 12, 2022

Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension

Description

Environment

Related topics