Description
Hi, I have run the python example code (from Accelerating Inference in TensorFlow with TensorRT User Guide - NVIDIA Docs) successfully, so I think setting of the environment is compatible.
However, there are some error when I tried to convert my own tensorflow model to tensorrt.
First situation is that I will get error when I do
def input_fn():
for _ in range(10):
inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
# yield tf.random.normal((1, 266, 256, 3)),
yield inp1,
converter.build(input_fn=input_fn)
(1,256,256,3) is the size of my original model input, but I got the below error:
2024-05-24 15:40:25.344989: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.345062: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.352216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.353862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory: -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.537977: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 2
2024-05-24 15:40:25.538064: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-05-24 15:40:25.545232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9852 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:c1:00.0, compute capability: 8.6
2024-05-24 15:40:25.546854: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13079 MB memory: -> device: 1, name: NVIDIA A2, pci bus id: 0000:a1:00.0, compute capability: 8.6
2024-05-24 15:40:25.569157: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:186] Calibration with FP32 or FP16 is not implemented. Falling back to use_calibration = False.Note that the default value of use_calibration is True.
2024-05-24 15:40:25.573340: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:962]
################################################################################
TensorRT unsupported/non-converted OP Report:
- Conv2DBackpropInput -> 4x
- Pack -> 4x
- Shape -> 4x
- StridedSlice -> 4x
- NoOp -> 2x
- Identity -> 1x
- Placeholder -> 1x
--------------------------------------------------------------------------------
- Total nonconverted OPs: 20
- Total nonconverted OP Types: 7
For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops.
################################################################################
2024-05-24 15:40:25.574199: W tensorflow/compiler/tf2tensorrt/segment/segment.cc:1290] The environment variable TF_TRT_MAX_ALLOWED_ENGINES=20 has no effect since there are only 5 TRT Engines with at least minimum_segment_size=3 nodes.
2024-05-24 15:40:25.574236: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:799] Number of TensorRT candidate segments: 5
2024-05-24 15:40:25.576787: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 0 consisting of 56 nodes by TRTEngineOp_002_000.
2024-05-24 15:40:25.576886: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 1 consisting of 13 nodes by TRTEngineOp_002_001.
2024-05-24 15:40:25.576934: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 2 consisting of 13 nodes by TRTEngineOp_002_002.
2024-05-24 15:40:25.576978: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 3 consisting of 13 nodes by TRTEngineOp_002_003.
2024-05-24 15:40:25.577020: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:913] Replaced segment 4 consisting of 18 nodes by TRTEngineOp_002_004.
2024-05-24 15:40:26.849448: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:847] TF-TRT Warning: Running native segment forTRTEngineOp_002_001 due to failure in verifying input shapes: Input shapes do not match input partial shapes stored in graph, for TRTEngineOp_002_001: [[1,32,32,64], [1,64,32,32]] != [[?,32,32,64], [?,32,32,64]]
2024-05-24 15:40:26.856632: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
[[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856696: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at trt_engine_op.cc:644 : INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
[[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
2024-05-24 15:40:26.856717: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: {{function_node TRTEngineOp_002_001_native_segment}} ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
[[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
[[TRTEngineOp_002_001]]
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
Cell In[13], line 29
27 converter.convert()
28 # converter.summary()
---> 29 converter.build(input_fn=input_fn)
30 converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/compiler/tensorrt/trt_convert.py:1495, in TrtGraphConverterV2.build(self, input_fn)
1493 first_input = inp
1494 args, kwargs = _convert_to_tensor(inp)
-> 1495 func(*args, **kwargs)
1497 if self._need_trt_profiles():
1498 # Disable profile generation.
1499 self._for_each_trt_node(self._converted_graph_def,
1500 partial(_set_profile_generation_mode, False))
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1477, in ConcreteFunction.__call__(self, *args, **kwargs)
1427 def __call__(self, *args, **kwargs):
1428 """Executes the wrapped function.
1429
1430 ConcreteFunctions have two signatures:
(...)
1475 TypeError: If the arguments do not match the function's signature.
1476 """
-> 1477 return self._call_impl(args, kwargs)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/wrap_function.py:243, in WrappedFunction._call_impl(self, args, kwargs, cancellation_manager)
241 return self._call_flat(args, self.captured_inputs)
242 else:
--> 243 return super(WrappedFunction, self)._call_impl(
244 args, kwargs, cancellation_manager)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1495, in ConcreteFunction._call_impl(self, args, kwargs, cancellation_manager)
1492 except TypeError:
1493 raise structured_err
-> 1495 return self._call_with_flat_signature(args, kwargs, cancellation_manager)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1549, in ConcreteFunction._call_with_flat_signature(self, args, kwargs, cancellation_manager)
1544 if not isinstance(
1545 arg, (ops.Tensor, resource_variable_ops.BaseResourceVariable)):
1546 raise TypeError(f"{self._flat_signature_summary()}: expected argument "
1547 f"#{i}(zero-based) to be a Tensor; "
1548 f"got {type(arg).__name__} ({arg}).")
-> 1549 return self._call_flat(args, self.captured_inputs, cancellation_manager)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1757, in ConcreteFunction._call_flat(self, args, captured_inputs, cancellation_manager)
1753 possible_gradient_type = gradients_util.PossibleTapeGradientTypes(args)
1754 if (possible_gradient_type == gradients_util.POSSIBLE_GRADIENT_TYPES_NONE
1755 and executing_eagerly):
1756 # No tape is watching; skip to running the function.
-> 1757 return self._build_call_outputs(self._inference_function.call(
1758 ctx, args, cancellation_manager=cancellation_manager))
1759 forward_backward = self._select_forward_and_backward_functions(
1760 args,
1761 possible_gradient_type,
1762 executing_eagerly)
1763 forward_function, args_with_tangents = forward_backward.forward()
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:381, in _EagerDefinedFunction.call(self, ctx, args, cancellation_manager)
379 with _InterpolateFunctionError(self):
380 if cancellation_manager is None:
--> 381 outputs = execute.execute(
382 str(self.signature.name),
383 num_outputs=self._num_outputs,
384 inputs=args,
385 attrs=attrs,
386 ctx=ctx)
387 else:
388 outputs = execute.execute_with_cancellation(
389 str(self.signature.name),
390 num_outputs=self._num_outputs,
(...)
393 ctx=ctx,
394 cancellation_manager=cancellation_manager)
File ~/anaconda3/envs/tf2/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:52, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
50 try:
51 ctx.ensure_initialized()
---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
53 inputs, attrs, num_outputs)
54 except core._NotOkStatusException as e:
55 if name is not None:
InvalidArgumentError: Graph execution error:
ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [1,32,32,64] vs. shape[1] = [1,64,32,32]
[[{{node StatefulPartitionedCall/model/concatenate/concat}}]]
[[TRTEngineOp_002_001]] [Op:__inference_pruned_15869]
And the second situation is that if I didn’t do the instruction (converter.build) before I save the converter, I won’t get error here.But the same error still occurred when inference.
Because I can run my original tensorflow model successfully, I think the shape error is not caused by the architecture of the model or the input data, but might be the TF-TRT converter.
How can I fixed this problem? Is there any step of the conversion I missed?
Environment
TensorRT Version: 8.4.3
GPU Type: NVIDIA GeForce RTX 3060
Nvidia Driver Version: 535.171.04
CUDA Version: 11.4
CUDNN Version: 8.9.2.26
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.9
TensorFlow Version (if applicable): 2.12.0
Relevant Files
Below file includes “native_saved_model” & “tftrt_saved_model”
models.zip (8.3 MB)
Steps To Reproduce
def crop(data, h=256, w=256, stride=128):
collect = []
for i in range(0, np.ceil(data.shape[1]/stride).astype('int')*stride-h+1, stride):
for j in range(0, np.ceil(data.shape[2]/stride).astype('int')*stride-w+1, stride):
if i+h > data.shape[1] and j+w > data.shape[2]:
collect.append(data[:,-h:,-w:,:])
elif i+h > data.shape[1]:
collect.append(data[:,-h:,j:j+w,:])
elif j+w > data.shape[2]:
collect.append(data[:,i:i+h,-w:,:])
else:
collect.append(data[:,i:i+h,j:j+w,:])
crop_data = np.concatenate(collect, axis=0)
return crop_data
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=SAVED_MODEL_DIR,
precision_mode=trt.TrtPrecisionMode.FP32
)
def input_fn():
for _ in range(10):
inp1 = np.random.normal(size=(1,256,256,3)).astype(np.float32)
# yield tf.random.normal((1, 266, 256, 3)),
yield inp1,
converter.convert()
converter.build(input_fn=input_fn)
converter.save(output_saved_model_dir=OUTPUT_SAVED_MODEL_DIR)
saved_model_loaded = tf.saved_model.load(OUTPUT_SAVED_MODEL_DIR, tags=[tag_constants.SERVING])
signature_keys = list(saved_model_loaded.signatures.keys())
print(signature_keys)
model = saved_model_loaded.signatures['serving_default']
X = np.concatenate([crop(X_test[0].reshape(-1, 256, 256, 3))], axis=0) # X_test[0] is an image which shape is (256,256,3)
image_input = tf.constant(X.astype('float32'))
predict_ = model(input_1=image_input)