Incorrect slicing of boolean constant tensor with step size >1

user33875 · May 27, 2022, 9:15pm

Description

When importing an ONNX model that has a constant boolean tensor (in a form of an initializer), the slicing over this tensor with a step size >1 is not performed correctly.

The issue seems to be specific to constant boolean and step size > 1. I tried removing any of them and the problem went away.

Code Examples

The following code slices a tensor of shape [4,4,4] with step size of 2, i.e., tensor[0:4:2,:,:], but the result is incorrect compared to those of PyTorch and NumPy.

import torch
import onnx
import numpy as np


def run_trt(inputs, input_names, y_tch):
    # assume output names are ["o0"]
    # copy-paste from https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#import_model_python
    import tensorrt as trt
    logger = trt.Logger(trt.Logger.WARNING)

    builder = trt.Builder(logger)

    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)

    success = parser.parse_from_file("output.onnx")
    for idx in range(parser.num_errors):
        print(parser.get_error(idx))
    assert success

    config = builder.create_builder_config()
    config.set_memory_pool_limit(
        trt.MemoryPoolType.WORKSPACE, 1 << 20)  # 1 MiB
    serialized_engine = builder.build_serialized_network(network, config)

    runtime = trt.Runtime(logger)
    engine = runtime.deserialize_cuda_engine(serialized_engine)

    context = engine.create_execution_context()

    buffers = [None] * engine.num_bindings
    inputs_cuda = [x.cuda() for x in inputs]
    for x_cuda, name in zip(inputs_cuda, input_names):
        input_idx = engine[name]
        buffers[input_idx] = x_cuda.data_ptr()

    y_cuda = torch.zeros_like(torch.from_numpy(y_tch)).cuda()
    output_idx = engine['o0']
    buffers[output_idx] = y_cuda.data_ptr()

    context.execute_v2(buffers)
    return y_cuda.cpu().numpy()


def main():
    class Model(torch.nn.Module):
        def __init__(self) -> None:
            super().__init__()
            self.b = torch.rand(size=[4, 4, 4]) > 0.5
            self.b = torch.nn.Parameter(self.b, requires_grad=False)

        def forward(self):
            y = self.b[0:4:2, :, :]
            # y = torch.where(y, torch.zeros_like(x), x)
            return y

    # inputs = (torch.rand(size=[2, 4, 4]), )
    inputs = ()
    model = Model()
    y_tch = model(*inputs).numpy()
    # input_names = ["i0"]
    input_names = []
    torch.onnx.export(model, inputs, "output.onnx", verbose=True,
                      input_names=input_names, output_names=["o0"], opset_version=14)
    onnx.checker.check_model("output.onnx", full_check=True)

    y_trt = run_trt(inputs, input_names, y_tch)
    np.testing.assert_allclose(y_tch, model.b.numpy()[0:4:2, :, :],
                               err_msg='Torch v.s. NumPy failed')
    np.testing.assert_allclose(y_tch, y_trt, err_msg='Torch v.s. TRT failed')


main()

The visualization of the model:

The output:

AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Torch v.s. TRT failed
Mismatched elements: 14 / 32 (43.8%)
 x: array([[[False,  True, False,  True],
        [ True, False, False,  True],
        [ True, False,  True,  True],...
 y: array([[[ True,  True,  True,  True],
        [ True,  True, False, False],
        [ True,  True,  True,  True],...

Note that this model doesn’t have any inputs because it is minimized for bug reproducing. It will still trigger the same bug if more layers/inputs are added, for example, we can change the code to:

def main():
    class Model(torch.nn.Module):
        def __init__(self) -> None:
            super().__init__()
            self.b = torch.rand(size=[4, 4, 4]) > 0.5
            self.b = torch.nn.Parameter(self.b, requires_grad=False)

        def forward(self, x):
            y = self.b[0:4:2, :, :]
            y = torch.where(y, torch.zeros_like(x), x)
            return y

    inputs = (torch.rand(size=[2, 4, 4]), )
    # inputs = ()
    model = Model()
    y_tch = model(*inputs).numpy()
    input_names = ["i0"]
    # input_names = []
    torch.onnx.export(model, inputs, "output.onnx", verbose=True,
                      input_names=input_names, output_names=["o0"], opset_version=14)
    onnx.checker.check_model("output.onnx", full_check=True)

    y_trt = run_trt(inputs, input_names, y_tch)
    # np.testing.assert_allclose(y_tch, model.b.numpy()[0:4:2, :, :],
    #                            err_msg='Torch v.s. NumPy failed')
    np.testing.assert_allclose(y_tch, y_trt, err_msg='Torch v.s. TRT failed')

The model looks like:

And the error:

AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Torch v.s. TRT failed
Mismatched elements: 15 / 32 (46.9%)
Max absolute difference: 0.98568505
Max relative difference: 1.
 x: array([[[0.      , 0.477325, 0.983834, 0.      ],
        [0.829976, 0.      , 0.290815, 0.      ],
        [0.      , 0.      , 0.      , 0.343628],...
 y: array([[[0.      , 0.      , 0.      , 0.      ],
        [0.      , 0.      , 0.290815, 0.773806],
        [0.      , 0.      , 0.      , 0.      ],...

Environment

TensorRT Version: 8.4.0.6
GPU Type: RTX2080
Nvidia Driver Version: 495.29.05
CUDA Version: 11.5
CUDNN Version: 8.3.0
Operating System + Version: Ubuntu 1804
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11.0
Baremetal or Container (if container which image + tag): Container built on top of nvcr.io/nvidia/tensorrt@21.11-py3

Relevant Files

ONNX file of the first model
output.onnx.zip (1.0 KB)

Steps To Reproduce

Run the provided python script.
Full error log (of the first code snippet)

/workspace/.venv/trt-8.4.0.6/lib/python3.8/site-packages/torch/onnx/utils.py:363: UserWarning: No input args
  warnings.warn("No input args")
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
Warning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied.
graph(%b : Bool(4, 4, 4, strides=[16, 4, 1], requires_grad=0, device=cpu)):
  %1 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={0}]() # test.py:444:0
  %2 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={0}]() # test.py:444:0
  %3 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={4}]() # test.py:444:0
  %4 : Long(1, strides=[1], device=cpu) = onnx::Constant[value={2}]() # test.py:444:0
  %o0 : Bool(2, 4, 4, strides=[32, 4, 1], requires_grad=0, device=cpu) = onnx::Slice(%b, %2, %3, %1, %4) # test.py:444:0
  return (%o0)

[05/27/2022-17:11:17] [TRT] [W] onnx2trt_utils.cpp:365: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/27/2022-17:11:18] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.7.3
[05/27/2022-17:11:18] [TRT] [W] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.3.0
Traceback (most recent call last):
  File "test.py", line 464, in <module>
    main()
  File "test.py", line 461, in main
    np.testing.assert_allclose(y_tch, y_trt, err_msg='Torch v.s. TRT failed')
  File "/workspace/.venv/trt-8.4.0.6/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/workspace/.venv/trt-8.4.0.6/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Torch v.s. TRT failed
Mismatched elements: 17 / 32 (53.1%)
 x: array([[[False,  True,  True,  True],
        [ True, False, False,  True],
        [False,  True,  True,  True],...
 y: array([[[ True,  True,  True,  True],
        [ True,  True, False, False],
        [ True,  True,  True,  True],...

NVES · May 27, 2022, 9:37pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

user33875 · May 28, 2022, 1:27am

Hi @NVES , I already did and tried what you request or suggest (for the model:

, and the models are all validated:

),
except for your last suggestion of running with trtexec. It is a bit troublesome (as I understand I would have to pack the input into binary), so I am using the python API documented here: Developer Guide :: NVIDIA Deep Learning TensorRT Documentation. I think they should be the same?

Let me know what else I can do.

spolisetty · May 31, 2022, 1:00pm

Hi,

We could reproduce similar results. Meanwhile, could you please try onnxruntime and compare the Pytorch model and the ONNX model output to make sure the ONNX model doesn’t have any issues.

Thank you.

user33875 · June 1, 2022, 9:38pm

Thanks for your reply. I compared with ONNXRUNTIME using the following code snippet and saw no errors raised:

    import onnxruntime as ort
    sess = ort.InferenceSession(
        "output.onnx", providers=['CPUExecutionProvider'])
    y_ort = sess.run(['o0'], {})[0]

    np.testing.assert_allclose(y_ort, y_tch,
                               err_msg='Torch v.s. ORT failed.')
    print('=====> Torch is consistent with ORT')

user33875 · June 16, 2022, 3:11am

Hi @spolisetty, just curious, any followups here?

spolisetty · June 16, 2022, 4:42pm

Hi,

Sorry for the delay. We could reproduce the same issue and our team will work on this, please allow us some time.