IScatterLayer cannot be used to compute a shape tensor

Description

I used polygraphy to check the onnx model produced by mx2onnx, but the execution is failed:

[E] [graph.cpp::symbolicExecute::611] Error Code 4: Internal Error (node_of_ssdanchorgenerator0_slice_like0_ends: an IScatterLayer cannot be used to compute a shape tensor)
[E] ModelImporter.cpp:726: While parsing node number 260 [Slice -> "ssdanchorgenerator0_slice_like0"]:
[E] ModelImporter.cpp:727: --- Begin node ---
[E] ModelImporter.cpp:728: input: "ssdanchorgenerator0_anchor_0"
    input: "ssdanchorgenerator0_slice_like0_starts"
    input: "ssdanchorgenerator0_slice_like0_ends"
    output: "ssdanchorgenerator0_slice_like0"
    op_type: "Slice"
[E] ModelImporter.cpp:729: --- End node ---
[E] ModelImporter.cpp:732: ERROR: ModelImporter.cpp:185 In function parseGraph:
    [6] Invalid Node - node_of_ssdanchorgenerator0_slice_like0
    [graph.cpp::symbolicExecute::611] Error Code 4: Internal Error (node_of_ssdanchorgenerator0_slice_like0_ends: an IScatterLayer cannot be used to compute a shape tensor)
[E] In node 260 (parseGraph): INVALID_NODE: Invalid Node - node_of_ssdanchorgenerator0_slice_like0
    [graph.cpp::symbolicExecute::611] Error Code 4: Internal Error (node_of_ssdanchorgenerator0_slice_like0_ends: an IScatterLayer cannot be used to compute a shape tensor)
[!] Could not parse ONNX correctly
[E] FAILED | Runtime: 2.718s | Command: /home/ANT.AMAZON.COM/zhdongxi/.pyenv/versions/MXNET_TEST/bin/polygraphy run mxnet_exported_corner_exp.onnx --trt

I think there is some restrictions in tensorrt that prevents some ops in ONNX model.

In this case, I tried to convert a MXNet model to ONNX and run it in tensorrt and the issued operator is called slice_like. The implementation of this op (in ONNX) can be found here: mxnet/_op_translations_opset12.py at e2ed553f89ec46f0366e005ff0768b153bba3f94 · apache/mxnet · GitHub

I think feeding the “Shape” output to “Scatter” is causing the issue here, please correct me if I am wrong. Is there a workaround for this case? Can anybody provide some tips about it?

Environment

TensorRT Version: 8.5.1.7
GPU Type: 1080Ti
Nvidia Driver Version: 515.65.07
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.8.13
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
MXNet Version (if applicable): 1.9.1
ONNX Version (if applicable): 1.12.0

Relevant Files

Due to some confidential reason, I cannot post the model here, but as mentioned in the description, the issued op is slice-like and its implementation can be found on page here: mxnet/_op_translations_opset12.py at e2ed553f89ec46f0366e005ff0768b153bba3f94 · apache/mxnet · GitHub.

Steps To Reproduce

polygraphy run MY_MODEL.onnx --trt

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

Hope the following may help you.

Thank you.

Thanks for the response. The post you sent makes sense to me. But the issue I am having is a bit different here. Shape is not an input to the network. I am still not sure which node causes the issue?.

From the mx2onnx source code, there are several operations in the slice_like operation function:

        nodes += [
            make_node('Shape', [input_nodes[0]], [name+'_shape_0']),
            make_node('Shape', [input_nodes[1]], [name+'_shape_1']),
            make_node('Shape', [name+'_shape_0'], [name+'_dim_0']),
            make_node('Less', [name+'_axes_', name+'_0'], [name+'_less']),
            make_node('Cast', [name+'_less'], [name+'_mask'], to=int(TensorProto.INT64)),
            make_node('Mul', [name+'_mask', name+'_dim_0'], [name+'_mul']),
            make_node('Add', [name+'_axes_', name+'_mul'], [name+'_axes']),
            make_node('ConstantOfShape', [name+'_dim_0'], [name+'_starts'], value=zero),
            make_node('GatherND', [name+'_shape_1', name+'_axes'], [name+'_gather']),
            make_node('ScatterND', [name+'_shape_0', name+'_axes', name+'_gather'],
                      [name+'_ends']),
            make_node('Slice', [input_nodes[0], name+'_starts', name+'_ends'], [name])
            ]

Could you please point out which operation is causing the issue here? Is the shape_0 in ScatterND node causing the issue or something else?

Thanks

Hi,

Could you please share with us the ONNX model here or via DM for better debugging.

Thank you.

I also met with this bug too. It’s the slice_like op in MXNet. A tensor a with shape [1, 1, H, W, A] are sliced to [1, 1, h, w, A] by the shape of x, which is [B, N, h, w]:
slice_like(a, x)
It’s the same with:

h, w = a.shape[2], a.shape[3]
a = a[:, :, :h, :w, :]

It’s fixed by:
daquexian/onnx-simplifier: Simplify your onnx model (github.com)