Pytorch tensor.size() to TensorRT6.0 failure

xmpeng · April 16, 2020, 2:59am

Description

When I tranform a dummy model from Pytorch to ONNX with the op torch.size(), it generates onnx::Gather and onnx::ConstantOfShape together with some other ops. An example shows below:

feat_size = 32
in_channel = 64
out_channel = 128
kernel_size = 3
batch = 2
feat = torch. randn(batch, in_channel, feat_size, feat_size, device="cuda")
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size, padding=(1, 1))
    def forward(self, x):
        b, c, h, w = x.size()
        y = torch.ones((b, c, h, w), dtype=torch.float32, device="cuda")
        x = self. conv1(y)
model = Net().to("cuda")
torch.onnx.export(model, feat, "model.onnx", verbose=True)

When I tried to generate TensorRT engine, the following error occured:

In node 2 (importGather): UNSUPPORTED_NODE: Assertion failed: (data->getType() == nvinfer1::DataType::kINT32 && nbDims == 1) && "Cannot perform gather on a shape tensor!"

Since onnx::Gather operates the same manner of np.take, I tried to add a plugin layer to do the same thing. It failed because the input to Gather was a shape tensor and the plugin layer cannot take shape tensor as input.

On the other hand, I observed that the Split op defined in /TensorRT/parsers/onnx/builtin_op_imports.cpp used exactly a shape tensor as input in gatherDimension()(defined in onnx2trt_utils.cpp), so I wonder if Gather op is properly handled in onnx_tensorrt and I comment out the assertion in the implementation in builtin_op_imports.cpp.

Furthermore, I found that onnx::ConstantOfShape is not supported in TensorRT6.0 but it is supported in TensorRT7.0 and there is no additional op involved in the implementation not supported by TensorRT6.0. So I just copy the implementation in TensorRT7.0 to TensorRT6.0:

DEFINE_BUILTIN_OP_IMPORTER(ConstantOfShape)
{
    OnnxAttrs attrs(node);
    nvinfer1::ITensor* shape = &convertToTensor(inputs.at(0), ctx);
    ShapedWeights zeroWeights
        = ctx->createTempWeights(::ONNX_NAMESPACE::TensorProto_DataType_FLOAT, nvinfer1::Dims{1, 1});
    static_cast<float*>(zeroWeights.values)[0] = 0.f;
    auto valueWeights = TensorOrWeights{attrs.get("value", zeroWeights)};
    nvinfer1::ITensor* value = &convertToTensor(valueWeights, ctx);
    return {{constantOfShape(ctx, value, shape)}};
}

After doing the above two steps, there is a shape mismatch some where in ConstantOfShape.

Would you please add support to torch.size() and more documentation on addGather?

Environment

TensorRT Version: 6.0.1.5
GPU Type: V100
Nvidia Driver Version:
CUDA Version: 10.1
CUDNN Version: 7.6
Operating System + Version: Ubuntu18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

SunilJB · April 16, 2020, 6:14am

Hi,

Pytorch 1.4 model is not supported in TRT 6.0.
Will recommend you to use TRT 7.

Thanks

xmpeng · April 16, 2020, 6:46am

Thanks for your promte reply. Due to compatibility issues, I cannot upgrade my TRT from 6.0 to 7.0. Which version of Pytorch would you suggest to solve this problem? I will check if it is the Pytorch version that causes this problem and keep you posted.
Many thanks.

SunilJB · April 16, 2020, 6:49am

Hi,

You can use Pytorch 1.3 or lower version with TRT 6.
Custom plugins needs to be created for any ops that are not supported in TRT 6.

Thanks

xmpeng · April 16, 2020, 2:35pm

Hi SunilJB,
I have changed the Pytorch version to 1.3.0. Here is my python script to reproduce:

import torch
from torch import nn
import tensorrt as trt

def test_trt_export(model_name):
	import tensorrt as trt
	TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
	trt.init_libnvinfer_plugins(TRT_LOGGER, '')
	
	with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
		builder.max_workspace_size = 1 << 30
		builder.fp16_mode = False
		builder.max_batch_size = 1	

		with open(model_name, 'rb') as model:
		    if not parser.parse(model.read()):
		        print ('ERROR: Failed to parse the ONNX file.')
		        for error in range(parser.num_errors):
		            print (parser.get_error(error))
		        return None

		engine = builder.build_cuda_engine(network)
		print ("CUDA engin build successfully!")
		return engine

class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()

	def forward(self, x):
		n, c, h, w = x.size()
		y = torch.ones((n, c, h, w), dtype=torch.float32)
		return y

feat = torch.randn(1, 3, 5, 5)
model = Net()

torch.onnx.export(model, feat, "model.onnx", verbose=True)

test_trt_export("model.onnx")

The error message is:

ERROR: Failed to parse the ONNX file.
In node 2 (importGather): UNSUPPORTED_NODE: Assertion failed: !(data->getType() == nvinfer1::DataType::kINT32 && nbDims == 1) && "Cannot perform gather on a shape tensor!"

It is stated in the document that for IGatherLayer it supports typer inference algebra as:
IGatherLayer: t × t → t
So I am confused why the assertion prevent shape tensor as the input.
Thanks.

SunilJB · April 17, 2020, 9:37am

Hi,

Can you try solution suggested in below link:
https://github.com/onnx/onnx-tensorrt/issues/283#issuecomment-545703178

Thanks

xmpeng · April 17, 2020, 11:40am

I have tried this, but there is an error when the onnx-simplifier works on models contain custom ops that are not registered in symbolic opset9. I think this is related to the onnx’s checker.

SunilJB · April 17, 2020, 12:45pm

Hi,

Can you try generating the ONNX model using latest ONNX opset version?

Thanks

xmpeng · April 17, 2020, 1:18pm

I am using symbolic method to register my op (subclass of autograd.funciton) following torch.onnx — PyTorch 1.12 documentation. It is not an Aten operator. Instead of manipulating the ONNX op or ONNX graph to by-pass the problem, are there anything to do with the Gather operation itself?
In TensorRT7, the onnx-tensorrt define Gather like this:

DEFINE_BUILTIN_OP_IMPORTER(Gather)
{
    nvinfer1::ITensor& data = convertToTensor(inputs.at(0), ctx);
    // TRT does not support BOOL input types for this node
    ASSERT(data.getType() != nvinfer1::DataType::kBOOL, ErrorCode::kUNSUPPORTED_NODE);
    nvinfer1::ITensor& indices = convertToTensor(inputs.at(1), ctx);
    OnnxAttrs attrs(node, ctx);
    int axis = attrs.get<int>("axis", 0);
    int nbDims = inputs.at(0).shape().nbDims;
    TRT_CHECK(convertAxis(axis, nbDims));
    LOG_VERBOSE("Using Gather axis: " << axis);
    RETURN_FIRST_OUTPUT(ctx->network()->addGather(data, indices, axis));
}

There is no such assertion on the shape tensor input and there is no documentation to illustrate what is the difference of Gather between TensorRT6.0 and TensorRT7.0.
Thanks for your attention.

ersheng · April 21, 2020, 5:04am

It is true that new supports for onnx were implemented in TensorRT7.0. This is why this assertion as a restriction was removed.
This issue (gather does not support shape tensor) is solved in TensorRT7.0.

Topic		Replies	Views
Support for gather with dynamic shapes TensorRT	4	1911	March 15, 2023
Gather node output wrong when converting with TensorRT TensorRT	23	2051	January 10, 2023
Converting Pytorch model to ONNX to Tensorrt TensorRT	2	10433	February 15, 2019
[TensorRT] INTERNAL ERROR: Assertion failed: validateInputsCutensor(src, dst) TensorRT	3	2958	October 12, 2021
TensorRt 5.0.2.6 No importer registered for op: Gather TensorRT	8	2497	February 3, 2021
Gather layer‘s output is all 0 in trt model TensorRT tensorrt	6	1000	October 12, 2021
pytorch view() is not fully supported using onnx2tensorrt workflow TensorRT	3	1103	July 15, 2019
No importer registered for op: GatherND, while converting ONNX to Engine TensorRT	6	2773	December 15, 2021
IUnaryLayer cannot be used to compute a shape tensor TensorRT tensorrt	6	2446	August 12, 2022
TRT is not working as expected, in contrast to torch and onnxruntime TensorRT	9	1119	September 1, 2022

Pytorch tensor.size() to TensorRT6.0 failure

Description

Environment

Related topics