Slice layer broken on TensorRT 6.0.1-1

bpinaya · April 7, 2020, 10:51am

Description

This issue is related to getDimensions() and getBindingDimensions() different in host and in Jetson AGX Xavier - #3 by AastaLLL
So I thought the problem was something wrong with getDimensions() but I’ve found an issue with the Slice layer in onnx to tensorrt:

When exporting a network defined in Pytorch as:

class SNET(nn.Module):

    def __init__(self):
        super(SNET, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, 3)

    def forward(self, x):
        x = self.conv1(x)
        return x

With an input size of 100x100 the exported onnx layer looks like this (visualized using Netron)
snetnet_noslice
I’ve modified the sample code for OnnxMnist from tensorrt just to parse the network and output the input and output sizes. I’ve attached it to the issue related but I’m adding a simpler version here to reproduce easier. The network parses ok with no error and the input and output sizes are ok, as the following picture shows:

Now, if I add a Slice layer like this on Pytorch:

class SNET(nn.Module):
    def __init__(self):
        super(SNET, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, 3)

    def forward(self, x):
        x = self.conv1(x)
        return x[:, :, 24:74, 24:74]

The exported onnx looks like this:

which of course also makes sense, now when parsing this network the input dimensions are correct but the output dimensions are wrong, as you can see here and also test yourself.

There is no error codes, no warning (except from the ir_version, but I’ve tested different ones and it’s persistent, also on pytorch1.1 used to export which is the one recommended on the documentation for this trt version), nothing, just the wrong dimensions, and of course the network does not work. I’ve tested multiple things before confirming this. I’ve reproduced on Xavier with Jetpack4.3 and also in a nvidia docker image from ngc. Very tricky behavior and cause of a big headache. I’m tagging @AastaLLL since he saw the other issue related.

You can download the code here https://drive.google.com/open?id=1GUw5pWP73Ej_FmdxjhZVvtwzuIXdtxXh with both sample nets included. I’ll test on TRT7 to see if the behavior is the same, but I remember this issue did not happen on trt5

Environment

Jetpack 4.3 on Xavier
Docker Image from NGC:

docker pull nvcr.io/nvidia/tensorrt:19.12-py3

docker run --gpus all -it --rm -v /yourVOLUME:/workspace/smpOnnx nvcr.io/nvidia/tensorrt:19.12-py3

Relevant Files

https://drive.google.com/open?id=1GUw5pWP73Ej_FmdxjhZVvtwzuIXdtxXh

Steps To Reproduce

Make project and run, change sampleOnnx.cpp line 211 from :
params.onnxFileName = "snet_slice.onnx";
To
params.onnxFileName = "snet_noslice.onnx";
to test both behaviors.

SunilJB · April 8, 2020, 5:27am

Moving to Jetson Xavier forum so that Jetson team can take a look.

AastaLLL · April 8, 2020, 7:39am

Suppose this is duplicate to the topic 118058:

We are reproducing this issue internally.
Will update the progress to the topic 118058 directly.

Thanks.