Pytorch tensor.size() to TensorRT6.0 failure

Description

When I tranform a dummy model from Pytorch to ONNX with the op torch.size(), it generates onnx::Gather and onnx::ConstantOfShape together with some other ops. An example shows below:

feat_size = 32
in_channel = 64
out_channel = 128
kernel_size = 3
batch = 2
feat = torch. randn(batch, in_channel, feat_size, feat_size, device="cuda")
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channel, out_channel, kernel_size, padding=(1, 1))
    def forward(self, x):
        b, c, h, w = x.size()
        y = torch.ones((b, c, h, w), dtype=torch.float32, device="cuda")
        x = self. conv1(y)
model = Net().to("cuda")
torch.onnx.export(model, feat, "model.onnx", verbose=True)

When I tried to generate TensorRT engine, the following error occured:

In node 2 (importGather): UNSUPPORTED_NODE: Assertion failed: (data->getType() == nvinfer1::DataType::kINT32 && nbDims == 1) && "Cannot perform gather on a shape tensor!"

Since onnx::Gather operates the same manner of np.take, I tried to add a plugin layer to do the same thing. It failed because the input to Gather was a shape tensor and the plugin layer cannot take shape tensor as input.

On the other hand, I observed that the Split op defined in /TensorRT/parsers/onnx/builtin_op_imports.cpp used exactly a shape tensor as input in gatherDimension()(defined in onnx2trt_utils.cpp), so I wonder if Gather op is properly handled in onnx_tensorrt and I comment out the assertion in the implementation in builtin_op_imports.cpp.

Furthermore, I found that onnx::ConstantOfShape is not supported in TensorRT6.0 but it is supported in TensorRT7.0 and there is no additional op involved in the implementation not supported by TensorRT6.0. So I just copy the implementation in TensorRT7.0 to TensorRT6.0:

DEFINE_BUILTIN_OP_IMPORTER(ConstantOfShape)
{
    OnnxAttrs attrs(node);
    nvinfer1::ITensor* shape = &convertToTensor(inputs.at(0), ctx);
    ShapedWeights zeroWeights
        = ctx->createTempWeights(::ONNX_NAMESPACE::TensorProto_DataType_FLOAT, nvinfer1::Dims{1, 1});
    static_cast<float*>(zeroWeights.values)[0] = 0.f;
    auto valueWeights = TensorOrWeights{attrs.get("value", zeroWeights)};
    nvinfer1::ITensor* value = &convertToTensor(valueWeights, ctx);
    return {{constantOfShape(ctx, value, shape)}};
}

After doing the above two steps, there is a shape mismatch some where in ConstantOfShape.

Would you please add support to torch.size() and more documentation on addGather?

Environment

TensorRT Version: 6.0.1.5
GPU Type: V100
Nvidia Driver Version:
CUDA Version: 10.1
CUDNN Version: 7.6
Operating System + Version: Ubuntu18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.4
Baremetal or Container (if container which image + tag):

Hi,

Pytorch 1.4 model is not supported in TRT 6.0.
Will recommend you to use TRT 7.

Thanks

Thanks for your promte reply. Due to compatibility issues, I cannot upgrade my TRT from 6.0 to 7.0. Which version of Pytorch would you suggest to solve this problem? I will check if it is the Pytorch version that causes this problem and keep you posted.
Many thanks.

Hi,

You can use Pytorch 1.3 or lower version with TRT 6.
Custom plugins needs to be created for any ops that are not supported in TRT 6.

Thanks

Hi SunilJB,
I have changed the Pytorch version to 1.3.0. Here is my python script to reproduce:

import torch
from torch import nn
import tensorrt as trt

def test_trt_export(model_name):
	import tensorrt as trt
	TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
	trt.init_libnvinfer_plugins(TRT_LOGGER, '')
	
	with trt.Builder(TRT_LOGGER) as builder, builder.create_network(1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) as network, trt.OnnxParser(network, TRT_LOGGER) as parser:
		builder.max_workspace_size = 1 << 30
		builder.fp16_mode = False
		builder.max_batch_size = 1	

		with open(model_name, 'rb') as model:
		    if not parser.parse(model.read()):
		        print ('ERROR: Failed to parse the ONNX file.')
		        for error in range(parser.num_errors):
		            print (parser.get_error(error))
		        return None

		engine = builder.build_cuda_engine(network)
		print ("CUDA engin build successfully!")
		return engine

class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()

	def forward(self, x):
		n, c, h, w = x.size()
		y = torch.ones((n, c, h, w), dtype=torch.float32)
		return y

feat = torch.randn(1, 3, 5, 5)
model = Net()

torch.onnx.export(model, feat, "model.onnx", verbose=True)

test_trt_export("model.onnx")

The error message is:

ERROR: Failed to parse the ONNX file.
In node 2 (importGather): UNSUPPORTED_NODE: Assertion failed: !(data->getType() == nvinfer1::DataType::kINT32 && nbDims == 1) && "Cannot perform gather on a shape tensor!"

It is stated in the document that for IGatherLayer it supports typer inference algebra as:
IGatherLayer: t × t → t
So I am confused why the assertion prevent shape tensor as the input.
Thanks.

Hi,

Can you try solution suggested in below link:

Thanks

I have tried this, but there is an error when the onnx-simplifier works on models contain custom ops that are not registered in symbolic opset9. I think this is related to the onnx’s checker.

Hi,

Can you try generating the ONNX model using latest ONNX opset version?

Thanks

I am using symbolic method to register my op (subclass of autograd.funciton) following https://pytorch.org/docs/stable/onnx.html#non-aten-operators. It is not an Aten operator. Instead of manipulating the ONNX op or ONNX graph to by-pass the problem, are there anything to do with the Gather operation itself?
In TensorRT7, the onnx-tensorrt define Gather like this:

DEFINE_BUILTIN_OP_IMPORTER(Gather)
{
    nvinfer1::ITensor& data = convertToTensor(inputs.at(0), ctx);
    // TRT does not support BOOL input types for this node
    ASSERT(data.getType() != nvinfer1::DataType::kBOOL, ErrorCode::kUNSUPPORTED_NODE);
    nvinfer1::ITensor& indices = convertToTensor(inputs.at(1), ctx);
    OnnxAttrs attrs(node, ctx);
    int axis = attrs.get<int>("axis", 0);
    int nbDims = inputs.at(0).shape().nbDims;
    TRT_CHECK(convertAxis(axis, nbDims));
    LOG_VERBOSE("Using Gather axis: " << axis);
    RETURN_FIRST_OUTPUT(ctx->network()->addGather(data, indices, axis));
}

There is no such assertion on the shape tensor input and there is no documentation to illustrate what is the difference of Gather between TensorRT6.0 and TensorRT7.0.
Thanks for your attention.

It is true that new supports for onnx were implemented in TensorRT7.0. This is why this assertion as a restriction was removed.
This issue (gather does not support shape tensor) is solved in TensorRT7.0.