Conv1d implemented by tf meets error when I use TensorRT to run the op!

Description

tf code:

from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile
import tensorflow as tf
import tf2onnx
import numpy as np
import onnxruntime as rt
import tensorrt as trt
import pycuda.driver as cuda 
import pycuda.autoinit

TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
def test_conv1d_ckpt2pb():
    save_path = '../models/pb/op_conv1d.pb'
    inputs = tf.placeholder(tf.float32, shape=([1,6,10]), name='inputs')
    outputs = tf.layers.conv1d(inputs, 5, kernel_size=3,
        strides=1, padding='same', dilation_rate=1, activation=tf.nn.relu)
    outputs = tf.identity(outputs, name='outputs')

    # ckpt2pb
    graph = tf.get_default_graph()
    output_graph_def = graph_util.convert_variables_to_constants(sess,
        graph.as_graph_def(), ['outputs'], 
        variable_names_whitelist=None, variable_names_blacklist=None)
    
    with gfile.GFile(save_path, "wb") as f:
        f.write(output_graph_def.SerializeToString())
    sess.close()

def test_conv1d_pb2onnx():
    onnx_path = '../models/onnx/op_conv1d.onnx'
    config=tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(allow_growth=True))
    sess = tf.Session(config=config)
    pb_path = '../models/pb/op_conv1d.pb'
    with gfile.FastGFile(pb_path, 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        sess.graph.as_default()
        tf.import_graph_def(graph_def, name='')

    onnx_graph = tf2onnx.tfonnx.process_tf_graph(sess.graph,
        input_names=["inputs:0"], output_names=["outputs:0"])
    model_proto = onnx_graph.make_model("test")
    with open(onnx_path, "wb") as f:
        f.write(model_proto.SerializeToString())
    sess.close()

EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
BATCH_SIZE = 1
def test_conv1d_trt_infer():
    onnx_path = '../models/onnx/op_conv1d.onnx'
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(EXPLICIT_BATCH)
    parser = trt.OnnxParser(network, TRT_LOGGER)
    builder.max_workspace_size = 1 << 30
    builder.max_batch_size = 1

    with open(onnx_path, 'rb') as model:
        print('Beginning ONNX file parsing')
        if not parser.parse(model.read()):
            print('ERROR: Failed to parse the ONNX file.')
            for error in range(parser.num_errors):
                print(parser.get_error(error))

    print('onnx parse end')
    network.get_input(0).shape = [1,6,10]
    engine = builder.build_cuda_engine(network)

    inputs, outputs, bindings, stream = allocate_buffers(engine)
    with engine.create_execution_context() as context:

        inputs[0].host = np.ones([1,6,10], dtype=np.float32)
        outputs_ = do_inference_v2(context, bindings=bindings, inputs=inputs, 
            outputs=outputs, stream=stream)
        print('trt:', outputs_[0])

test_conv1d_ckpt2pb()
test_conv1d_pb2onnx()
test_conv1d_trt_infer()

tf->onnx->trt
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)

[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:509: Marking outputs:0_1 as output: outputs:0
[TensorRT] VERBOSE: Applying generic optimizations to the graph for inference.
[TensorRT] VERBOSE: Original: 14 layers
[TensorRT] VERBOSE: After dead-layer removal: 14 layers
[TensorRT] VERBOSE: After Myelin optimization: 14 layers
[TensorRT] VERBOSE: After scale fusion: 14 layers
[TensorRT] VERBOSE: Fusing conv1d/conv1d/ExpandDims_1 with (Unnamed Layer* 1) [Shuffle]
[TensorRT] VERBOSE: Fusing conv1d/conv1d/ExpandDims with conv1d/conv1d/Conv2D__5
[TensorRT] VERBOSE: Fusing conv1d/conv1d/Conv2D__7 with conv1d/conv1d/Squeeze
[TensorRT] VERBOSE: Fusing conv1d/BiasAdd with (Unnamed Layer* 9) [Shuffle]
[TensorRT] VERBOSE: Fusing (Unnamed Layer* 10) [ElementWise] with conv1d/Relu
[TensorRT] VERBOSE: After vertical fusions: 9 layers
[TensorRT] VERBOSE: After final dead-layer removal: 9 layers
[TensorRT] VERBOSE: After tensor merging: 9 layers
[TensorRT] VERBOSE: After concat removal: 9 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.00118997 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: → Float(1,5,50,150) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.005888
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.005888
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,10,100) → Float(1,10,10,100) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/ExpandDims + conv1d/conv1d/Conv2D__5 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.00624
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.00624
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,5,50,150) → Float(1,3,3,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/Conv2D__6 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,5,50:32,50) → Float(1,3,3:32,3) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/Conv2D__6 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,10,10,100), Float(1,3,3,30) → Float(1,10,10,50) ***************
python: …/rtExt/cuda/customWinogradConvActRunner.cpp:48: nvinfer1::rt::cuda::WinogradConvActRunner::WinogradConvActRunner(nvinfer1::rt::DefaultRunnerParameters, const nvinfer1::ConvolutionParameters&, const std::vector&): Assertion `matchNbDims(mSrcTensors[0], mDstTensors[0]) && (mSrcTensors.size() == 1 || matchValidDims(popFront(mSrcTensors[1].extent), popFront(mDstTensors[0].extent)))’ failed.
Aborted (core dumped)

TensorRT not support op conv1d?

Environment

TensorRT Version: 7.0.0.11
GPU Type: 1080Ti
Nvidia Driver Version: 418.56
CUDA Version: 10.0
CUDNN Version: 7.6.5.32
Operating System + Version: ubuntu 16.04
Python Version (if applicable): 3.6.4
TensorFlow Version (if applicable): 1.13.2

Relevant Files

Steps To Reproduce

torch code:

import onnxruntime
import numpy as np
import os
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import torch

class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel,self).__init__()
        self.conv1d = torch.nn.Conv1d(in_channels=10, out_channels=5, kernel_size=3, padding=1, dilation=1)
        self.relu = torch.nn.ReLU(inplace=False)
    def forward(self, inputs):
        return self.relu(self.conv1d(inputs))

def export_onnx_model():
    onnx_model_file = '../models/onnx/op_conv1d_torch.onnx'
    inputs = torch.from_numpy(np.ones([1,10,6], dtype=np.float32)).float()
    model = MyModel()

    torch.onnx.export( model, (inputs), onnx_model_file, verbose=False, 
        input_names=['inputs'],output_names=['outputs'], opset_version =10,
        dynamic_axes={"inputs" : {0: "batch_size"}})

conv1d implemented by torch, trt run logs:

[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::GridAnchor_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::NMS_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Reorg_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Region_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::PriorBox_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Normalize_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::RPROI_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::FlattenConcat_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::CropAndResize
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Proposal
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::BatchTilePlugin_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::DetectionLayer_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::ProposalLayer_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::PyramidROIAlign_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::ResizeNearest_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::SpecialSlice_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::InstanceNormalization_TRT
importInputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:211: Adding network input: inputs with dtype: float32, dimensions: (-1, 10, 6)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: inputs for ONNX tensor: inputs
parseGraph
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:90: Importing initializer: conv1d.bias
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:90: Importing initializer: conv1d.weight
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:107: Parsing node: [Conv]
Parsing node: [Conv]
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: inputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: conv1d.weight
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: conv1d.bias
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:131: [Conv] inputs: [inputs → (-1, 10, 6)], [conv1d.weight → (5, 10, 3)], [conv1d.bias → (5)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:441: Convolution input dimensions: (-1, 10, 6)
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
1
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
4
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:1462: Original shape: (_, _, ), unsqueezing to: (, _, _, )
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:1323: Original shape: (
, _, _, ), squeezing to: (, _, _)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:550: Using kernel: (3, 1), strides: (1, 1), padding: (1, 0), dilations: (1, 1), numOutputs: 5
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:551: Convolution output dimensions: (-1, 5, -1, -1)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:122: Registering layer: (Unnamed Layer* 0) [Shape] for ONNX node:
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: 3 for ONNX tensor: 3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:186: [Conv] outputs: [3 → (-1, -1, -1)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:107: Parsing node: [Relu]
Parsing node: [Relu]
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: 3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:131: [Relu] inputs: [3 → (-1, -1, -1)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:122: Registering layer: (Unnamed Layer* 11) [Activation] for ONNX node:
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: outputs_1 for ONNX tensor: outputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:186: [Relu] outputs: [outputs → (-1, -1, -1)],
parseGraph end
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:509: Marking outputs_1 as output: outputs
importModel:0
onnx parse end
[TensorRT] VERBOSE: Applying generic optimizations to the graph for inference.
[TensorRT] VERBOSE: Original: 4 layers
[TensorRT] VERBOSE: After dead-layer removal: 4 layers
[TensorRT] VERBOSE: After Myelin optimization: 4 layers
[TensorRT] VERBOSE: After scale fusion: 4 layers
[TensorRT] VERBOSE: After vertical fusions: 4 layers
[TensorRT] VERBOSE: After final dead-layer removal: 4 layers
[TensorRT] VERBOSE: After tensor merging: 4 layers
[TensorRT] VERBOSE: After concat removal: 4 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.000932611 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: Float(1,6,60) → Float(1,1,6,60) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 5) [Shuffle] (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,1,6,60) → Float(1,1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (LegacySASSConvolution)
[TensorRT] VERBOSE: LegacySASSConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (FusedConvActConvolution)
[TensorRT] VERBOSE: FusedConvActConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CaskConvolution)
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: 1825138533642645384 time 0.018496
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 3915320020053085238 time 0.018432
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 6808617066150061604 time 0.016064
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -8060443123034038864 time 0.016384
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -4420849921117327522 time 0.01296
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: -3946921629105938337 time 0.016192
[TensorRT] VERBOSE: Fastest Tactic: -4420849921117327522 Time: 0.01296
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CudaConvolution)
[TensorRT] VERBOSE: Tactic: 0 time 0.012288
[TensorRT] VERBOSE: Tactic: 1 time 0.020288
[TensorRT] VERBOSE: Tactic: 2 time 0.019808
[TensorRT] VERBOSE: Tactic: 4 time 0.022912
[TensorRT] VERBOSE: Tactic: 5 time 0.029856
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.012288
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CudaDepthwiseConvolution)
[TensorRT] VERBOSE: CudaDepthwiseConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: >>>>>>>>>>>>>>> Chose Runner Type: CudaConvolution Tactic: 0
[TensorRT] VERBOSE:
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,1,6,30) → Float(1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 10) [Shuffle] (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,6,30) → Float(1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 11) [Activation] (Activation)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: Formats and tactics selection completed in 0.228408 seconds.
[TensorRT] VERBOSE: After reformat layers: 4 layers
[TensorRT] VERBOSE: Block size 1073741824
[TensorRT] VERBOSE: Block size 512
[TensorRT] VERBOSE: Block size 512
[TensorRT] VERBOSE: Block size 0
[TensorRT] VERBOSE: Total Activation Memory: 1073742848
[TensorRT] INFO: Detected 1 inputs and 1 output network tensors.
[TensorRT] VERBOSE: Engine generation completed in 11.1885 seconds.
[TensorRT] VERBOSE: Engine Layer Information:
[TensorRT] VERBOSE: Layer(Shuffle): (Unnamed Layer* 5) [Shuffle], Tactic: 0, inputs[Float(10,6)] → (Unnamed Layer* 5) [Shuffle]_output[Float(10,6,1)]
[TensorRT] VERBOSE: Layer(Convolution): (Unnamed Layer* 6) [Convolution], Tactic: 0, (Unnamed Layer* 5) [Shuffle]_output[Float(10,6,1)] → (Unnamed Layer* 6) [Convolution]_output[Float(5,6,1)]
[TensorRT] VERBOSE: Layer(Activation): (Unnamed Layer* 11) [Activation], Tactic: 0, 3[Float(5,6)] → outputs[Float(5,6)]
0 inputs
1 outputs
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles

if I use torch instead of tf, the error disappears.
torch->onnx->trt
It seems that conv1d can be supported by tensorRT if it is implemented by torch. But errors appear if conv1d implemented by tf.

Hi,

Can you try few things:

  1. Check ONNX model using checker function and see if it passes?
    import onnx
    model = onnx.load(“model.onnx”)
    onnx.checker.check_model(model)

  2. If (1) passes, please try trtexec commands to generate TRT model:
    https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes

Thanks

Hi,
Thank you for your attention, I fixed this problem by add the code below when converting pb to onnx:

  onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)

After optimizing tf graph, tensorRT would run the model successfully.
Thank you again.