Input shapes do not match input partial shapes stored in graph

clairewang0409 · May 24, 2024, 8:17am

Description

Hi, I have run the python example code (from Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs) successfully, so I think setting of the environment is compatible.
However, there are some error when I tried to convert my own tensorflow model to tensorrt.

First situation is that I will get error when I do

def input_fn():
    for _ in range(10):
        inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
        # yield tf.random.normal((1, 266, 256, 3)),
        yield inp1,

converter.build(input_fn=input_fn)

(1,256,256,3) is the size of my original model input, but I got the below error:

2024-05-24 15:40:25.344989: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.345062: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.352216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.353862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory:  -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.537977: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.538064: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.545232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.546854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory:  -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.569157: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:186] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
2024-05-24 15:40:25.573340: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:962] 

################################################################################
TensorRT unsupported/non-converted OP Report:
	- Conv2DBackpropInput -> 4x
	- Pack -> 4x
	- Shape -> 4x
	- StridedSlice -> 4x
	- NoOp -> 2x
	- Identity -> 1x
	- Placeholder -> 1x
--------------------------------------------------------------------------------
	- Total nonconverted OPs: 20
	- Total nonconverted OP Types: 7
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################

2024-05-24 15:40:25.574199: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1290] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 5 TRT Engines with  at least minimum_segment_size=3 nodes.
2024-05-24 15:40:25.574236: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 5
2024-05-24 15:40:25.576787: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 0 consisting of 56 nodes by TRTEngineOp_002_000.
2024-05-24 15:40:25.576886: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 1 consisting of 13 nodes by TRTEngineOp_002_001.
2024-05-24 15:40:25.576934: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 2 consisting of 13 nodes by TRTEngineOp_002_002.
2024-05-24 15:40:25.576978: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 3 consisting of 13 nodes by TRTEngineOp_002_003.
2024-05-24 15:40:25.577020: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 4 consisting of 18 nodes by TRTEngineOp_002_004.
2024-05-24 15:40:26.849448: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:847] TF-TRT Warning: Running native segment forTRTEngineOp_002_001 due to failure in verifying input shapes: Input shapes do not match input partial shapes stored in graph, for TRTEngineOp_002_001: [[1,32,32,64], [1,64,32,32]] != [[?,32,32,64], [?,32,32,64]]
2024-05-24 15:40:26.856632: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856696: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at trt_engine_op.cc:644 : INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856717: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
	 [[TRTEngineOp_002_001]]

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[13], line 29
     27 converter.convert()
     28 # converter.summary()
---> 29 converter.build(input_fn=input_fn)
     30 converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py:1495, in TrtGraphConverterV2.build(self, input_fn)
   1493     first_input = inp
   1494   args, kwargs = _convert_to_tensor(inp)
-> 1495   func(*args, **kwargs)
   1497 if self._need_trt_profiles():
   1498   # Disable profile generation.
   1499   self._for_each_trt_node(self._converted_graph_def,
   1500                           partial(_set_profile_generation_mode, False))

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1477, in ConcreteFunction.__call__(self, *args, **kwargs)
   1427 def __call__(self, *args, **kwargs):
   1428   """Executes the wrapped function.
   1429 
   1430   ConcreteFunctions have two signatures:
   (...)
   1475     TypeError: If the arguments do not match the function's signature.
   1476   """
-> 1477   return self._call_impl(args, kwargs)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/wrap_function.py:243, in WrappedFunction._call_impl(self, args, kwargs, cancellation_manager)
    241   return self._call_flat(args, self.captured_inputs)
    242 else:
--> 243   return super(WrappedFunction, self)._call_impl(
    244       args, kwargs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1495, in ConcreteFunction._call_impl(self, args, kwargs, cancellation_manager)
   1492     except TypeError:
   1493       raise structured_err
-> 1495 return self._call_with_flat_signature(args, kwargs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1549, in ConcreteFunction._call_with_flat_signature(self, args, kwargs, cancellation_manager)
   1544   if not isinstance(
   1545       arg, (ops.Tensor, resource_variable_ops.BaseResourceVariable)):
   1546     raise TypeError(f"{self._flat_signature_summary()}: expected argument "
   1547                     f"#{i}(zero-based) to be a Tensor; "
   1548                     f"got {type(arg).__name__} ({arg}).")
-> 1549 return self._call_flat(args, self.captured_inputs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1757, in ConcreteFunction._call_flat(self, args, captured_inputs, cancellation_manager)
   1753 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args)
   1754 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE
   1755     and executing_eagerly):
   1756   # No tape is watching; skip to running the function.
-> 1757   return self._build_call_outputs(self._inference_function.call(
   1758       ctx, args, cancellation_manager=cancellation_manager))
   1759 forward_backward = self._select_forward_and_backward_functions(
   1760     args,
   1761     possible_gradient_type,
   1762     executing_eagerly)
   1763 forward_function, args_with_tangents = forward_backward.forward()

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:381, in _EagerDefinedFunction.call(self, ctx, args, cancellation_manager)
    379 with _InterpolateFunctionError(self):
    380   if cancellation_manager is None:
--> 381     outputs = execute.execute(
    382         str(self.signature.name),
    383         num_outputs=self._num_outputs,
    384         inputs=args,
    385         attrs=attrs,
    386         ctx=ctx)
    387   else:
    388     outputs = execute.execute_with_cancellation(
    389         str(self.signature.name),
    390         num_outputs=self._num_outputs,
   (...)
    393         ctx=ctx,
    394         cancellation_manager=cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

InvalidArgumentError: Graph execution error:

ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
	 [[TRTEngineOp_002_001]] [Op:__inference_pruned_15869]

And the second situation is that if I didn’t do the instruction (converter.build) before I save the converter, I won’t get error here.But the same error still occurred when inference.

Because I can run my original tensorflow model successfully, I think the shape error is not caused by the architecture of the model or the input data, but might be the TF-TRT converter.

How can I fixed this problem? Is there any step of the conversion I missed?

Environment

TensorRT Version: 8.4.3
GPU Type: NVIDIA GeForce RTX 3060
Nvidia Driver Version: 535.171.04
CUDA Version: 11.4
CUDNN Version: 8.9.2.26
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.9
TensorFlow Version (if applicable): 2.12.0

Relevant Files

Below file includes “native_saved_model” & “tftrt_saved_model”
models.zip (8.3 MB)

Steps To Reproduce

def crop(data, h=256, w=256, stride=128):
    collect = []
    for i in range(0, np.ceil(data.shape[1]/stride).astype('int')*stride-h+1, stride):
        for j in range(0, np.ceil(data.shape[2]/stride).astype('int')*stride-w+1, stride):
            if i+h > data.shape[1] and j+w > data.shape[2]:
                collect.append(data[:,-h:,-w:,:])
            elif i+h > data.shape[1]:
                collect.append(data[:,-h:,j:j+w,:])
            elif j+w > data.shape[2]:
                collect.append(data[:,i:i+h,-w:,:])
            else:
                collect.append(data[:,i:i+h,j:j+w,:])
    crop_data = np.concatenate(collect, axis=0)

    return crop_data

converter = trt.TrtGraphConverterV2(
   input_saved_model_dir=SAVED_MODEL_DIR,
   precision_mode=trt.TrtPrecisionMode.FP32
)

def input_fn():
    for _ in range(10):
        inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
        # yield tf.random.normal((1, 266, 256, 3)),
        yield inp1,

converter.convert()
converter.build(input_fn=input_fn)
converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)

saved_model_loaded = tf.saved_model.load(OUTPUT_SAVED_MODEL_DIR, tags=[tag_constants.SERVING])
signature_keys = list(saved_model_loaded.signatures.keys())
print(signature_keys)

model = saved_model_loaded.signatures['serving_default']

X = np.concatenate([crop(X_test[0].reshape(-1, 256, 256, 3))], axis=0)     # X_test[0] is an image which shape is (256,256,3)
image_input = tf.constant(X.astype('float32'))
predict_ = model(input_1=image_input)

AakankshaS · May 27, 2024, 8:49am

Hi @clairewang0409 ,
Based on the error:

It looks like your model tries to concatenate two tensors with difference sizes.
Have you tried to inference the model with other frameworks?

Topic		Replies	Views
Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension TensorRT	8	1187	July 11, 2021
use tensorflow tensorrt API convert failed TensorRT	7	2947	May 2, 2018
[TFTRT 4.0.1.6] TFTRT 4.0.1.6 optimize Inception i3d network failure on FP32 mode TensorRT	6	1171	September 25, 2018
Calibration failed: INTERNAL: Failed to build TensorRT engine (INT8 precision mode) in Jetson Xavier NX (16GB) Jetson Xavier NX tensorrt	9	748	April 12, 2023
TensorRT Integration Speeds Up TensorFlow Inference Technical Blog	40	797	March 27, 2020
Converting TF Model to TensorRT UFF Format Jetson TX2	27	23210	October 18, 2021
Tensorrt fails shapeMachine.cpp TensorRT tensorrt , cudnn	2	376	February 16, 2024
TensorFlow object detection and image classification accelerated for NVIDIA Jetson Jetson TX2	25	10498	June 3, 2019
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4476	October 18, 2021
Tensorrt fails for custom ssd_inception Model TensorRT	18	2799	May 14, 2020

Input shapes do not match input partial shapes stored in graph

Description

Environment

Relevant Files

Steps To Reproduce

Related topics