Description
tf code:
from tensorflow.python.framework import graph_util
from tensorflow.python.platform import gfile
import tensorflow as tf
import tf2onnx
import numpy as np
import onnxruntime as rt
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
def test_conv1d_ckpt2pb():
save_path = '../models/pb/op_conv1d.pb'
inputs = tf.placeholder(tf.float32, shape=([1,6,10]), name='inputs')
outputs = tf.layers.conv1d(inputs, 5, kernel_size=3,
strides=1, padding='same', dilation_rate=1, activation=tf.nn.relu)
outputs = tf.identity(outputs, name='outputs')
# ckpt2pb
graph = tf.get_default_graph()
output_graph_def = graph_util.convert_variables_to_constants(sess,
graph.as_graph_def(), ['outputs'],
variable_names_whitelist=None, variable_names_blacklist=None)
with gfile.GFile(save_path, "wb") as f:
f.write(output_graph_def.SerializeToString())
sess.close()
def test_conv1d_pb2onnx():
onnx_path = '../models/onnx/op_conv1d.onnx'
config=tf.ConfigProto(allow_soft_placement=True, gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=config)
pb_path = '../models/pb/op_conv1d.pb'
with gfile.FastGFile(pb_path, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sess.graph.as_default()
tf.import_graph_def(graph_def, name='')
onnx_graph = tf2onnx.tfonnx.process_tf_graph(sess.graph,
input_names=["inputs:0"], output_names=["outputs:0"])
model_proto = onnx_graph.make_model("test")
with open(onnx_path, "wb") as f:
f.write(model_proto.SerializeToString())
sess.close()
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
BATCH_SIZE = 1
def test_conv1d_trt_infer():
onnx_path = '../models/onnx/op_conv1d.onnx'
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(EXPLICIT_BATCH)
parser = trt.OnnxParser(network, TRT_LOGGER)
builder.max_workspace_size = 1 << 30
builder.max_batch_size = 1
with open(onnx_path, 'rb') as model:
print('Beginning ONNX file parsing')
if not parser.parse(model.read()):
print('ERROR: Failed to parse the ONNX file.')
for error in range(parser.num_errors):
print(parser.get_error(error))
print('onnx parse end')
network.get_input(0).shape = [1,6,10]
engine = builder.build_cuda_engine(network)
inputs, outputs, bindings, stream = allocate_buffers(engine)
with engine.create_execution_context() as context:
inputs[0].host = np.ones([1,6,10], dtype=np.float32)
outputs_ = do_inference_v2(context, bindings=bindings, inputs=inputs,
outputs=outputs, stream=stream)
print('trt:', outputs_[0])
test_conv1d_ckpt2pb()
test_conv1d_pb2onnx()
test_conv1d_trt_infer()
tf->onnx->trt
TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:509: Marking outputs:0_1 as output: outputs:0
[TensorRT] VERBOSE: Applying generic optimizations to the graph for inference.
[TensorRT] VERBOSE: Original: 14 layers
[TensorRT] VERBOSE: After dead-layer removal: 14 layers
[TensorRT] VERBOSE: After Myelin optimization: 14 layers
[TensorRT] VERBOSE: After scale fusion: 14 layers
[TensorRT] VERBOSE: Fusing conv1d/conv1d/ExpandDims_1 with (Unnamed Layer* 1) [Shuffle]
[TensorRT] VERBOSE: Fusing conv1d/conv1d/ExpandDims with conv1d/conv1d/Conv2D__5
[TensorRT] VERBOSE: Fusing conv1d/conv1d/Conv2D__7 with conv1d/conv1d/Squeeze
[TensorRT] VERBOSE: Fusing conv1d/BiasAdd with (Unnamed Layer* 9) [Shuffle]
[TensorRT] VERBOSE: Fusing (Unnamed Layer* 10) [ElementWise] with conv1d/Relu
[TensorRT] VERBOSE: After vertical fusions: 9 layers
[TensorRT] VERBOSE: After final dead-layer removal: 9 layers
[TensorRT] VERBOSE: After tensor merging: 9 layers
[TensorRT] VERBOSE: After concat removal: 9 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.00118997 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: → Float(1,5,50,150) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.005888
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.005888
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,10,100) → Float(1,10,10,100) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/ExpandDims + conv1d/conv1d/Conv2D__5 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.00624
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.00624
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,5,50,150) → Float(1,3,3,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/Conv2D__6 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,5,50:32,50) → Float(1,3,3:32,3) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: conv1d/conv1d/Conv2D__6 (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: --------------- Timing Runner: (Reformat)
[TensorRT] VERBOSE: Tactic: 0 time 0.006144
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.006144
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,10,10,100), Float(1,3,3,30) → Float(1,10,10,50) ***************
python: …/rtExt/cuda/customWinogradConvActRunner.cpp:48: nvinfer1::rt::cuda::WinogradConvActRunner::WinogradConvActRunner(nvinfer1::rt::DefaultRunnerParameters, const nvinfer1::ConvolutionParameters&, const std::vector&): Assertion `matchNbDims(mSrcTensors[0], mDstTensors[0]) && (mSrcTensors.size() == 1 || matchValidDims(popFront(mSrcTensors[1].extent), popFront(mDstTensors[0].extent)))’ failed.
Aborted (core dumped)
TensorRT not support op conv1d?
Environment
TensorRT Version: 7.0.0.11
GPU Type: 1080Ti
Nvidia Driver Version: 418.56
CUDA Version: 10.0
CUDNN Version: 7.6.5.32
Operating System + Version: ubuntu 16.04
Python Version (if applicable): 3.6.4
TensorFlow Version (if applicable): 1.13.2
Relevant Files
Steps To Reproduce
torch code:
import onnxruntime
import numpy as np
import os
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import torch
class MyModel(torch.nn.Module):
def __init__(self):
super(MyModel,self).__init__()
self.conv1d = torch.nn.Conv1d(in_channels=10, out_channels=5, kernel_size=3, padding=1, dilation=1)
self.relu = torch.nn.ReLU(inplace=False)
def forward(self, inputs):
return self.relu(self.conv1d(inputs))
def export_onnx_model():
onnx_model_file = '../models/onnx/op_conv1d_torch.onnx'
inputs = torch.from_numpy(np.ones([1,10,6], dtype=np.float32)).float()
model = MyModel()
torch.onnx.export( model, (inputs), onnx_model_file, verbose=False,
input_names=['inputs'],output_names=['outputs'], opset_version =10,
dynamic_axes={"inputs" : {0: "batch_size"}})
conv1d implemented by torch, trt run logs:
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::GridAnchor_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::NMS_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Reorg_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Region_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::PriorBox_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Normalize_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::RPROI_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::BatchedNMS_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::FlattenConcat_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::CropAndResize
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::Proposal
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::BatchTilePlugin_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::DetectionLayer_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::ProposalLayer_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::PyramidROIAlign_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::ResizeNearest_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::SpecialSlice_TRT
[TensorRT] VERBOSE: Plugin creator registration succeeded - ONNXTRT_NAMESPACE::InstanceNormalization_TRT
importInputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:211: Adding network input: inputs with dtype: float32, dimensions: (-1, 10, 6)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: inputs for ONNX tensor: inputs
parseGraph
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:90: Importing initializer: conv1d.bias
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:90: Importing initializer: conv1d.weight
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:107: Parsing node: [Conv]
Parsing node: [Conv]
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: inputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: conv1d.weight
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: conv1d.bias
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:131: [Conv] inputs: [inputs → (-1, 10, 6)], [conv1d.weight → (5, 10, 3)], [conv1d.bias → (5)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:441: Convolution input dimensions: (-1, 10, 6)
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
1
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
4
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:1462: Original shape: (_, _, ), unsqueezing to: (, _, _, )
[TensorRT] WARNING: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:216: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/onnx2trt_utils.cpp:1323: Original shape: (, _, _, ), squeezing to: (, _, _)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:550: Using kernel: (3, 1), strides: (1, 1), padding: (1, 0), dilations: (1, 1), numOutputs: 5
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/builtin_op_importers.cpp:551: Convolution output dimensions: (-1, 5, -1, -1)
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:122: Registering layer: (Unnamed Layer* 0) [Shape] for ONNX node:
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: 3 for ONNX tensor: 3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:186: [Conv] outputs: [3 → (-1, -1, -1)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:107: Parsing node: [Relu]
Parsing node: [Relu]
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:125: Searching for input: 3
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:131: [Relu] inputs: [3 → (-1, -1, -1)],
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:122: Registering layer: (Unnamed Layer* 11) [Activation] for ONNX node:
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ImporterContext.hpp:97: Registering tensor: outputs_1 for ONNX tensor: outputs
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:186: [Relu] outputs: [outputs → (-1, -1, -1)],
parseGraph end
[TensorRT] VERBOSE: /mnt/disk0/liuqingjie/TensorRT/parsers/onnx/ModelImporter.cpp:509: Marking outputs_1 as output: outputs
importModel:0
onnx parse end
[TensorRT] VERBOSE: Applying generic optimizations to the graph for inference.
[TensorRT] VERBOSE: Original: 4 layers
[TensorRT] VERBOSE: After dead-layer removal: 4 layers
[TensorRT] VERBOSE: After Myelin optimization: 4 layers
[TensorRT] VERBOSE: After scale fusion: 4 layers
[TensorRT] VERBOSE: After vertical fusions: 4 layers
[TensorRT] VERBOSE: After final dead-layer removal: 4 layers
[TensorRT] VERBOSE: After tensor merging: 4 layers
[TensorRT] VERBOSE: After concat removal: 4 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.000932611 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 out of 1
*************** Autotuning format combination: Float(1,6,60) → Float(1,1,6,60) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 5) [Shuffle] (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,1,6,60) → Float(1,1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (LegacySASSConvolution)
[TensorRT] VERBOSE: LegacySASSConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (FusedConvActConvolution)
[TensorRT] VERBOSE: FusedConvActConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CaskConvolution)
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: 1825138533642645384 time 0.018496
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 3915320020053085238 time 0.018432
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 6808617066150061604 time 0.016064
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -8060443123034038864 time 0.016384
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -4420849921117327522 time 0.01296
[TensorRT] VERBOSE: (Unnamed Layer* 6) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: -3946921629105938337 time 0.016192
[TensorRT] VERBOSE: Fastest Tactic: -4420849921117327522 Time: 0.01296
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CudaConvolution)
[TensorRT] VERBOSE: Tactic: 0 time 0.012288
[TensorRT] VERBOSE: Tactic: 1 time 0.020288
[TensorRT] VERBOSE: Tactic: 2 time 0.019808
[TensorRT] VERBOSE: Tactic: 4 time 0.022912
[TensorRT] VERBOSE: Tactic: 5 time 0.029856
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0.012288
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 6) [Convolution] (CudaDepthwiseConvolution)
[TensorRT] VERBOSE: CudaDepthwiseConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: >>>>>>>>>>>>>>> Chose Runner Type: CudaConvolution Tactic: 0
[TensorRT] VERBOSE:
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,1,6,30) → Float(1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 10) [Shuffle] (Shuffle)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,6,30) → Float(1,6,30) ***************
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 11) [Activation] (Activation)
[TensorRT] VERBOSE: Tactic: 0 is the only option, timing skipped
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 0
[TensorRT] VERBOSE: Formats and tactics selection completed in 0.228408 seconds.
[TensorRT] VERBOSE: After reformat layers: 4 layers
[TensorRT] VERBOSE: Block size 1073741824
[TensorRT] VERBOSE: Block size 512
[TensorRT] VERBOSE: Block size 512
[TensorRT] VERBOSE: Block size 0
[TensorRT] VERBOSE: Total Activation Memory: 1073742848
[TensorRT] INFO: Detected 1 inputs and 1 output network tensors.
[TensorRT] VERBOSE: Engine generation completed in 11.1885 seconds.
[TensorRT] VERBOSE: Engine Layer Information:
[TensorRT] VERBOSE: Layer(Shuffle): (Unnamed Layer* 5) [Shuffle], Tactic: 0, inputs[Float(10,6)] → (Unnamed Layer* 5) [Shuffle]_output[Float(10,6,1)]
[TensorRT] VERBOSE: Layer(Convolution): (Unnamed Layer* 6) [Convolution], Tactic: 0, (Unnamed Layer* 5) [Shuffle]_output[Float(10,6,1)] → (Unnamed Layer* 6) [Convolution]_output[Float(5,6,1)]
[TensorRT] VERBOSE: Layer(Activation): (Unnamed Layer* 11) [Activation], Tactic: 0, 3[Float(5,6)] → outputs[Float(5,6)]
0 inputs
1 outputs
[TensorRT] WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
if I use torch instead of tf, the error disappears.
torch->onnx->trt
It seems that conv1d can be supported by tensorRT if it is implemented by torch. But errors appear if conv1d implemented by tf.