Input shapes do not match input partial shapes stored in graph

Description

Hi, I have run the python example code (from Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs) successfully, so I think setting of the environment is compatible.
However, there are some error when I tried to convert my own tensorflow model to tensorrt.

First situation is that I will get error when I do

def input_fn():
    for _ in range(10):
        inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
        # yield tf.random.normal((1, 266, 256, 3)),
        yield inp1,

converter.build(input_fn=input_fn)

(1,256,256,3) is the size of my original model input, but I got the below error:

2024-05-24 15:40:25.344989: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.345062: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.352216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.353862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory:  -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.537977: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.538064: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.545232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.546854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory:  -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.569157: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:186] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
2024-05-24 15:40:25.573340: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:962] 

################################################################################
TensorRT unsupported/non-converted OP Report:
	- Conv2DBackpropInput -> 4x
	- Pack -> 4x
	- Shape -> 4x
	- StridedSlice -> 4x
	- NoOp -> 2x
	- Identity -> 1x
	- Placeholder -> 1x
--------------------------------------------------------------------------------
	- Total nonconverted OPs: 20
	- Total nonconverted OP Types: 7
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################

2024-05-24 15:40:25.574199: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1290] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 5 TRT Engines with  at least minimum_segment_size=3 nodes.
2024-05-24 15:40:25.574236: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 5
2024-05-24 15:40:25.576787: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 0 consisting of 56 nodes by TRTEngineOp_002_000.
2024-05-24 15:40:25.576886: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 1 consisting of 13 nodes by TRTEngineOp_002_001.
2024-05-24 15:40:25.576934: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 2 consisting of 13 nodes by TRTEngineOp_002_002.
2024-05-24 15:40:25.576978: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 3 consisting of 13 nodes by TRTEngineOp_002_003.
2024-05-24 15:40:25.577020: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 4 consisting of 18 nodes by TRTEngineOp_002_004.
2024-05-24 15:40:26.849448: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:847] TF-TRT Warning: Running native segment forTRTEngineOp_002_001 due to failure in verifying input shapes: Input shapes do not match input partial shapes stored in graph, for TRTEngineOp_002_001: [[1,32,32,64], [1,64,32,32]] != [[?,32,32,64], [?,32,32,64]]
2024-05-24 15:40:26.856632: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856696: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at trt_engine_op.cc:644 : INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856717: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
	 [[TRTEngineOp_002_001]]

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[13], line 29
     27 converter.convert()
     28 # converter.summary()
---> 29 converter.build(input_fn=input_fn)
     30 converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py:1495, in TrtGraphConverterV2.build(self, input_fn)
   1493     first_input = inp
   1494   args, kwargs = _convert_to_tensor(inp)
-> 1495   func(*args, **kwargs)
   1497 if self._need_trt_profiles():
   1498   # Disable profile generation.
   1499   self._for_each_trt_node(self._converted_graph_def,
   1500                           partial(_set_profile_generation_mode, False))

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1477, in ConcreteFunction.__call__(self, *args, **kwargs)
   1427 def __call__(self, *args, **kwargs):
   1428   """Executes the wrapped function.
   1429 
   1430   ConcreteFunctions have two signatures:
   (...)
   1475     TypeError: If the arguments do not match the function's signature.
   1476   """
-> 1477   return self._call_impl(args, kwargs)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/wrap_function.py:243, in WrappedFunction._call_impl(self, args, kwargs, cancellation_manager)
    241   return self._call_flat(args, self.captured_inputs)
    242 else:
--> 243   return super(WrappedFunction, self)._call_impl(
    244       args, kwargs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1495, in ConcreteFunction._call_impl(self, args, kwargs, cancellation_manager)
   1492     except TypeError:
   1493       raise structured_err
-> 1495 return self._call_with_flat_signature(args, kwargs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1549, in ConcreteFunction._call_with_flat_signature(self, args, kwargs, cancellation_manager)
   1544   if not isinstance(
   1545       arg, (ops.Tensor, resource_variable_ops.BaseResourceVariable)):
   1546     raise TypeError(f"{self._flat_signature_summary()}: expected argument "
   1547                     f"#{i}(zero-based) to be a Tensor; "
   1548                     f"got {type(arg).__name__} ({arg}).")
-> 1549 return self._call_flat(args, self.captured_inputs, cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1757, in ConcreteFunction._call_flat(self, args, captured_inputs, cancellation_manager)
   1753 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args)
   1754 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE
   1755     and executing_eagerly):
   1756   # No tape is watching; skip to running the function.
-> 1757   return self._build_call_outputs(self._inference_function.call(
   1758       ctx, args, cancellation_manager=cancellation_manager))
   1759 forward_backward = self._select_forward_and_backward_functions(
   1760     args,
   1761     possible_gradient_type,
   1762     executing_eagerly)
   1763 forward_function, args_with_tangents = forward_backward.forward()

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:381, in _EagerDefinedFunction.call(self, ctx, args, cancellation_manager)
    379 with _InterpolateFunctionError(self):
    380   if cancellation_manager is None:
--> 381     outputs = execute.execute(
    382         str(self.signature.name),
    383         num_outputs=self._num_outputs,
    384         inputs=args,
    385         attrs=attrs,
    386         ctx=ctx)
    387   else:
    388     outputs = execute.execute_with_cancellation(
    389         str(self.signature.name),
    390         num_outputs=self._num_outputs,
   (...)
    393         ctx=ctx,
    394         cancellation_manager=cancellation_manager)

File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     50 try:
     51   ctx.ensure_initialized()
---> 52   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     53                                       inputs, attrs, num_outputs)
     54 except core._NotOkStatusException as e:
     55   if name is not None:

InvalidArgumentError: Graph execution error:

ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
	 [[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
	 [[TRTEngineOp_002_001]] [Op:__inference_pruned_15869]

And the second situation is that if I didn’t do the instruction (converter.build) before I save the converter, I won’t get error here.But the same error still occurred when inference.

Because I can run my original tensorflow model successfully, I think the shape error is not caused by the architecture of the model or the input data, but might be the TF-TRT converter.

How can I fixed this problem? Is there any step of the conversion I missed?

Environment

TensorRT Version: 8.4.3
GPU Type: NVIDIA GeForce RTX 3060
Nvidia Driver Version: 535.171.04
CUDA Version: 11.4
CUDNN Version: 8.9.2.26
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.9
TensorFlow Version (if applicable): 2.12.0

Relevant Files

Below file includes “native_saved_model” & “tftrt_saved_model”
models.zip (8.3 MB)

Steps To Reproduce

def crop(data, h=256, w=256, stride=128):
    collect = []
    for i in range(0, np.ceil(data.shape[1]/stride).astype('int')*stride-h+1, stride):
        for j in range(0, np.ceil(data.shape[2]/stride).astype('int')*stride-w+1, stride):
            if i+h > data.shape[1] and j+w > data.shape[2]:
                collect.append(data[:,-h:,-w:,:])
            elif i+h > data.shape[1]:
                collect.append(data[:,-h:,j:j+w,:])
            elif j+w > data.shape[2]:
                collect.append(data[:,i:i+h,-w:,:])
            else:
                collect.append(data[:,i:i+h,j:j+w,:])
    crop_data = np.concatenate(collect, axis=0)

    return crop_data

converter = trt.TrtGraphConverterV2(
   input_saved_model_dir=SAVED_MODEL_DIR,
   precision_mode=trt.TrtPrecisionMode.FP32
)

def input_fn():
    for _ in range(10):
        inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
        # yield tf.random.normal((1, 266, 256, 3)),
        yield inp1,

converter.convert()
converter.build(input_fn=input_fn)
converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)

saved_model_loaded = tf.saved_model.load(OUTPUT_SAVED_MODEL_DIR, tags=[tag_constants.SERVING])
signature_keys = list(saved_model_loaded.signatures.keys())
print(signature_keys)

model = saved_model_loaded.signatures['serving_default']

X = np.concatenate([crop(X_test[0].reshape(-1, 256, 256, 3))], axis=0)     # X_test[0] is an image which shape is (256,256,3)
image_input = tf.constant(X.astype('float32'))
predict_ = model(input_1=image_input)

Hi @clairewang0409 ,
Based on the error:

It looks like your model tries to concatenate two tensors with difference sizes.
Have you tried to inference the model with other frameworks?