Can we make TensorRT handle nhwc Tensor?

I’m evaluating TensorRT on a VGG like model and my input is nchw.

However, I noticed that TensorRT will transform my model to nhwc for faster inference.

Since my model is Tensorflow, can we directly use nhwc as input so that we don’t need an input reformatter in TensorRT?

Input reformatter is very slow when input is large:
conv1_1_input/Conv2D + (Unnamed Layer* 2) [Activation] input reformatter 0 0.55792
conv1_1_input/Conv2D + (Unnamed Layer* 2) [Activation] 0.98768

NHWC tensor is faster than NCHW tensor, to perform a 32x32x3x3 conv on a tensor of size 1,32,300,1680
NCHW + FP32: 3ms on 2070.
NHWC + FP32: 1.9ms on 2070.

Therefore, can we add NHWC support in TensorRT directly?

Hi,

NHWC is supported in TRT.

Using NHWC will have some benefit according to device you use. In general:

  • with tensor cores, NHWC is (generally) preferred.
  • for non-tensor-core, NCHW is preferred.

Thanks

Maybe I miss something, but in that page I only see NHWC8. Also there’s NHWC for plugins but I didn’t see that we can directly pass in a NHWC tensor to a convolution layer.

It also didn’t mention NHWC for ITensor
https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#ad26d48b3a534843e9990ab7f903d34a7

Since we pass ITensor to IConvolutionLayer, is there anyway that I can let TensorRT know it is NHWC?

TensorFormat instructs TensorRT builder how to interpret the input format.
Please refer to below links:
https://devblogs.nvidia.com/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/

Thanks

Can we go back to the original question about NHWC format support for conv layer since it is faster on the latest GPUs?

In your blog post there’s nothing about NHWC format.
In the link you provide it set the input to NCHW as well.
parser->registerInput(“Input_0”, DimsCHW(1, 28, 28), UffInputOrder::kNCHW);

When I look at tensorflow code

It transpose tensor from nhwc to nchw in order to use IConvLayer.
But I know that IConvLayer try to transpose it back in order to use TensorCore.

In this case, why not make TensorRT support nhwc for IConvLayer?

If you look closely, Input shape used in this case is NHWC format
def build_engine(onnx_path, shape = [1,224,224,3])

Same approach can be used in UFF parse case to set input order as kNHWC
Please refer below link:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-710-ea/api/c_api/namespacenvuffparser.html#ae6a2a9503d69a32179de572c67b0b8be

Thanks

Please, I’m wondering if we can make IConvLayer support NHWC input instead of NCHW input so that we can avoid any shuffle or reformatter when doing the convolution compute?

In the blog post, the shape is [1, 224, 224, 3], but if you look at tf2onnx code it was referring to, it does the transpose when convert tf code to onnx code.

I know UFF parser support kNHWC, but it will transpose to NCHW when pass to IConvLayer and it is the additional latency that we want to avoid. Do you get what the problem is?

TensorRT uses NCHW uniformly when defining the semantics of operations. You can use the TensorFormat enum to gain access to TensorRT’s internal data layouts at network boundaries, which are optimized for TensorCore.

Thanks