Troubleshooting Suggestions for ONNX v. TensorRT discrepancies

Hello all! As the topic title suggests, I have noticed some prediction discrepancies between what should otherwise be identical models. As I can generate a TensorRT 5.1.5 engine based on that model without error, I am left without an obvious place or strategy to hunt down the issue. My workflow (and environment description) is below (all using the relevant Python APIs):

  1. Generate .pb from a TF-1.13.1 + Keras model, after disabling training nodes & freezing.
  2. Generate .onnx (both NCHW & NHWC) from .pb
  3. Ensure that predictions are identical between “TF + .pb” and “ONNX Runtime + .onnx”
  4. Generate .trt (both NCHW & NHWC) from .onnx
  5. Ensure that predictions are identical between “ONNX Runtime + .onnx” and “TensorRT5.15 + .onnx”

I make it all the way to (5) without issue, warning, or error; but the predictions from “ONNX Runtime + .onnx” are correct and the predictions from “TensorRT5.15 + .onnx” are incorrect.

I figure this will be hard, if not impossible, for anyone here to debug remotely, which I completely understand. So, in lieu of that , I was wondering if anyone could suggest any tools or techniques I can use to hunt down the discrepancy? Incidentally, this is the second model I have shepherded through this process; the first worked perfectly using nearly identical code… so I’m proper stuck :).

Apologies for such a general question, and thanks for your time and patience!

> dpkg -l | grep TensorRT    
ii  graphsurgeon-tf                     5.1.5-1+cuda10.1       amd64                  GraphSurgeon for TensorRT package
ii  libnvinfer-dev                      5.1.5-1+cuda10.1       amd64                  TensorRT development libraries and headers
ii  libnvinfer-samples                  5.1.5-1+cuda10.1       all                    TensorRT samples and documentation
ii  libnvinfer5                         5.1.5-1+cuda10.1       amd64                  TensorRT runtime libraries
ii  python3-libnvinfer                  5.1.5-1+cuda10.1       amd64                  Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev              5.1.5-1+cuda10.1       amd64                  Python 3 development package for TensorRT
ii  tensorrt                            5.1.5.0-1+cuda10.1     amd64                  Meta package of TensorRT
ii  uff-converter-tf                    5.1.5-1+cuda10.1       amd64                  UFF converter for TensorRT package


> pip show tensorflow onyx
Name: tensorflow
Version: 1.13.1
<snip>

Name: onnx
Version: 1.6.0
<snip>

Hi Austin,

Would it be possible to try running your scripts (step 4 and 5) in the latest TensorRT 6.0 container (to avoid you messing up your currently installation on host) and see if your new TRT6 engine gets the expected predictions there?

nvidia-docker run -it -v ${PWD}:/mnt nvcr.io/nvidia/tensorrt:19.09-py3

It’s possible that something that was fixed between versions that may be causing your problems.

You can see more details about the TensorRT container and how to install docker/nvidia-docker here: TensorRT | NVIDIA NGC

Thanks,
NVIDIA Enterprise Support

Will do. My work is currently based on nvcr.io/nvidia/tensorrt:19.08-py3, so I will bump that and check in the morning! Thanks!

Ok, finally got everything up and running, and I am still getting the same results. However, I am getting the following warning that I wasn’t getting before:

[TensorRT] WARNING: Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.

Which seems relevant. Haven’t found any good descriptions for what might be causing this exactly, or how to mitigate it.

> dpkg -l | grep TensorRT
ii  graphsurgeon-tf                                                             6.0.1-1+cuda10.1                           amd64                                      GraphSurgeon for TensorRT package
ii  libnvinfer-bin                                                              6.0.1-1+cuda10.1                           amd64                                      TensorRT binaries
ii  libnvinfer-dev                                                              6.0.1-1+cuda10.1                           amd64                                      TensorRT development libraries and headers
ii  libnvinfer-doc                                                              6.0.1-1+cuda10.1                           all                                        TensorRT documentation
ii  libnvinfer-plugin-dev                                                       6.0.1-1+cuda10.1                           amd64                                      TensorRT plugin libraries
ii  libnvinfer-plugin6                                                          6.0.1-1+cuda10.1                           amd64                                      TensorRT plugin libraries
ii  libnvinfer-samples                                                          6.0.1-1+cuda10.1                           all                                        TensorRT samples
ii  libnvinfer6                                                                 6.0.1-1+cuda10.1                           amd64                                      TensorRT runtime libraries
ii  libnvonnxparsers-dev                                                        6.0.1-1+cuda10.1                           amd64                                      TensorRT ONNX libraries
ii  libnvonnxparsers6                                                           6.0.1-1+cuda10.1                           amd64                                      TensorRT ONNX libraries
ii  libnvparsers-dev                                                            6.0.1-1+cuda10.1                           amd64                                      TensorRT parsers libraries
ii  libnvparsers6                                                               6.0.1-1+cuda10.1                           amd64                                      TensorRT parsers libraries
ii  python3-libnvinfer                                                          6.0.1-1+cuda10.1                           amd64                                      Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                                                      6.0.1-1+cuda10.1                           amd64                                      Python 3 development package for TensorRT
ii  tensorrt                                                                    6.0.1.4-1+cuda10.1                         amd64                                      Meta package of TensorRT
ii  uff-converter-tf                                                            6.0.1-1+cuda10.1                           amd64                                      UFF converter for TensorRT package

> pip show tensorflow onyx
Name: tensorflow
Version: 1.13.1
<snip>

Name: onnx
Version: 1.6.0
<snip>

Hi Austin,

I guess the first level of debugging could be to try viewing the resulting network that is created by the ONNX parser and compare that with the layers seen in the original ONNX model. Assuming you’re using the Python APIs as you mentioned, you could do something like this:

with trt.Builder(TRT_LOGGER) as builder, builder.create_network(), trt.OnnxParser(network, TRT_LOGGER) as parser:
    ...
    # Fill network attributes with information by parsing model
    with open("model.onnx", "rb") as f:
        parser.parse(f.read())

    print_network(network)


def print_network(network):
    for i in range(network.num_layers):
        layer = network.get_layer(i)

        print("\nLAYER {}".format(i))
        print("===========================================")
        layer_input = layer.get_input(0)
        if layer_input:
            print("\tInput Name:  {}".format(layer_input.name))
            print("\tInput Shape: {}".format(layer_input.shape))

        layer_output = layer.get_output(0)
        if layer_output:
            print("\tOutput Name:  {}".format(layer_output.name))
            print("\tOutput Shape: {}".format(layer_output.shape))
        print("===========================================")

To view the original ONNX model, there seem to be a few tools online:

  • https://github.com/lutzroeder/netron
  • https://github.com/onnx/tutorials/blob/master/tutorials/VisualizingAModel.md
  • or maybe you can just do something similar to the code above in your ONNX runtime code

Alternatively you could try to produce your ONNX->TRT engine using the trtexec tool to see if that gives you something different (probably not), indicating maybe there’s a mistake in your python code. The binary comes with the container, and would look something like:

trtexec --onnx=model.onnx --engine=model.engine

Thanks NVES_R!

This helped me solve my issue! – Turns out, I was mishandling the output during a resize operation. I wouldn’t have thought to look there, if not for this great debugging info you sent convincing me that the network was indeed correct.

In addition, I modified it a bit to make parsing a bit easier:

def store_network_description(network, trt_network_description):
    network_description = {}

    for i in range(network.num_layers):
        layer = network.get_layer(i)
        network_description[i] = {
            a: getattr(layer, a).__repr__() for a in
            ('name', 'type', 'precision', 'precision_is_set', 'num_inputs',
             'num_outputs')
        }
        network_description[i]['inputs'] = {
            i: {
                a: getattr(layer.get_input(i), a).__repr__() for a in
                ('name', 'shape', 'dtype')
            } for i in range(layer.num_inputs)
        }
        network_description[i]['outputs'] = {
            i: {
                a: getattr(layer.get_output(i), a).__repr__() for a in
                ('name', 'shape', 'dtype')
            } for i in range(layer.num_outputs)
        }

    with open(trt_network_description, 'w') as fp:
        print(f"Writing {trt_network_description}")
        json.dump(network_description, fp, indent=4, sort_keys=True)

Out of curiosity, do you have a good idea how to access and display the TensorFormat for each input/output? I think that would be useful to visualize as well.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/infer/Graph/LayerBase.html#tensorrt.TensorFormat

Thanks again! And I hope you have a great week!
~A

Hi Austin,

I’m not too familiar with the tensorrt.TensorFormat, but after some digging I found 2 possible solutions.

(1) Pass the engine to your function, and outside of the for loop you can do this:

...
    network_description = {}

    # (1) Get formats from engine bindings
    print("==== Engine Bindings ====")
    for i in range(engine.num_bindings):
        print("Binding", i)
        print("\tName:", engine.get_binding_name(i))
        print("\tShape:", engine.get_binding_shape(i))
        print("\tFormat:", str(engine.get_binding_format(i)))
        print("\tFormatDescription:", engine.get_binding_format_desc(i))

Which will output something like this:

==== Engine Bindings ====
Binding 0
	Name: gpu_0/data_0
	Shape: (3, 224, 224)
	Format: TensorFormat.LINEAR
	FormatDescription: Row major linear FP32 format (kLINEAR)
Binding 1
	Name: gpu_0/softmax_1
	Shape: (1000,)
	Format: TensorFormat.LINEAR
	FormatDescription: Row major linear FP32 format (kLINEAR)

But for my test network, it seems like the bindings only exist for the input and output layers, not any of the intermediate layers. I’m not too familiar with what these bindings are.

My attempt at getting info on the intermediate layers was this:

(2) Inside the for loop, get the layer’s input/output “allowed_formats” attribute, which is encoded as the result of binary OR’ing each of the allowed format enum values.

def bits_to_names(allowed_format_bits):
    formats = [trt.TensorFormat.LINEAR,
               trt.TensorFormat.CHW2,
               trt.TensorFormat.HWC8,
               trt.TensorFormat.CHW4,
               trt.TensorFormat.CHW16,
               trt.TensorFormat.CHW32]

    print("{:20}  {:4}  {:6}".format("Name", "Enum", "Bit"))
    for fmt in formats:
        fmt_bit = 1 << int(fmt)
        if fmt_bit & allowed_format_bits:
            print("{:20}  {:4}  {:06b}".format(str(fmt), int(fmt), fmt_bit))
...

        # (2) Get allowed formats for input/output tensor of each layer
        print("Layer", i, "Input Allowed Formats")
        layer_input = layer.get_input(0)
        if layer_input:
            print(bits_to_names(layer_input.allowed_formats))
        else:
            print("No input")

        print("Layer", i, "Output Allowed Formats")
        layer_output = layer.get_input(0)
        if layer_output:
            print(bits_to_names(layer_output.allowed_formats))
        else:
            print("No output")

Which will output something like this:

Layer 178 Output Allowed Formats
Name                  Enum  Bit   
TensorFormat.LINEAR      0  000001
TensorFormat.CHW2        1  000010
TensorFormat.HWC8        2  000100
TensorFormat.CHW4        3  001000
TensorFormat.CHW16       4  010000
TensorFormat.CHW32       5  100000

But I didn’t find this one particularly helpful, because I got that all of the formats were allowed for every single layer. Maybe this will be different for your network.


According to the engineering team, “There is no way to get information about intermediate layers that isn’t passed to the logger.”

Thanks,
NVIDIA Enterprise Support