Trtexec ignores inputIOFormat with onnx model

Description

I have a channel last TF model, and I convert it to onnx → trt. When invoking trtexec, even if I set --inputIOFormats=fp32:hwc, the input is still handled as channel first, and a pair of transposes (from channel last to channel first, then from channel first to channel last) are added. I wonder how I can get rid of these transposes to get better performance?

Environment

TensorRT Version: 8.5.1
GPU Type: RTX4000
Nvidia Driver Version: 525
CUDA Version: 11.8
CUDNN Version: 8.6
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable): 2.12
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Steps To Reproduce

  1. Run this Python file
import os

import tensorflow as tf
import numpy as np

SAVED_MODEL_DIR = "/tmp/resnet50"
ONNX_MODEL_PATH = SAVED_MODEL_DIR + ".onnx"

class ResNet50(tf.Module):
    def __init__(self):
        super().__init__()
        self.model = tf.keras.applications.resnet50.ResNet50(
            weights="imagenet",
            include_top=True
        )

    @tf.function
    def forward(self, inputs):
        return self.model(inputs, training=False)

resnet = ResNet50()
input_batch = np.float32(np.random.rand(1, 224, 224, 3))
print("tf result", resnet.forward(input_batch)[0, 0])

# Save saved model.
tensor_specs = [tf.TensorSpec((1, 224, 224, 3), tf.float32)]
call_signature = resnet.forward.get_concrete_function(*tensor_specs)

os.makedirs(SAVED_MODEL_DIR, exist_ok=True)
print(f"Saving {SAVED_MODEL_DIR} with call signature: {call_signature}")
tf.saved_model.save(resnet, SAVED_MODEL_DIR,
                    signatures={"serving_default": call_signature})

# convert to onnx
assert os.system(
    f"python -m tf2onnx.convert --saved-model {SAVED_MODEL_DIR} --output {ONNX_MODEL_PATH}") == 0

# convert to trt
assert os.system(f"trtexec --onnx={ONNX_MODEL_PATH} --verbose --inputIOFormats=fp32:hwc") == 0
  1. From the log, we can see transpose is added
StatefulPartitionedCall/resnet50/conv1_conv/Conv2D__6 [Transpose] inputs: [inputs -> (1, 224, 224, 3)[FLOAT]],
  1. Remove --inputIOFormats=fp32:hwc and rerun, you can get an exactly the same engine, which means it doesn’t take effect.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thanks! The model can be run with no problem. It just has redundant transpose/reformat.

I have shared both the onnx model and the verbose logs in https://drive.google.com/drive/folders/1lS0N2QuGY2UmC4sXDhZgPnG7jIlYwHAq?usp=drive_link, please take a look.

Hi,

Could you please try on the latest TensorRT version 8.6.1 and let us know if you still face the same issue.

Thank you.

I upgraded to 8.6.1 and retried, can still see the same issue

[08/14/2023-13:22:27] [V] [TRT] Searching for input: inputs
[08/14/2023-13:22:27] [V] [TRT] StatefulPartitionedCall/resnet50/conv1_conv/Conv2D__6 [Transpose] inputs: [inputs -> (1, 224, 224, 3)[FLOAT]], 

Friendly ping

TRT is using the best global schedule it can find, and that may involve introducing transposes. As a result, the assumption that eliminating transposes will improve performance is incorrect.

But when I visualize the graph, I got this

We can see TRT first shuffles NHWC to NCHW (Not sure if this is a bug in visualization) , then reformats it back to NHWC4. We should at least fuse Shuffle and Reformat.

Thank you for reporting it. It is missing optimization, and we’ll continue to work on it.

1 Like

This is nice! Thank you!