Does TensorRT rewrite ONNX models to NHWC?

jean.wanka · January 19, 2021, 11:05am

We are training with our convolutional networks tensorflow 2.3 and are exporting our models to onnx using keras2onnx.
A visualization of the beginning of the onnx model can be seen below.
The input is in NHWC, but since onnx uses NCHW it adds a transpose layer before the convolutions.
I would expect that tensorrt removes this transpose layer and executes the convolutions with NHWC on GPUs.
However, when profiling with trtexec it shows a PushTranspose Layer (see below) that also consumes time.

Does this mean the convolutions are indeed executed with NCHW or how can I know what is going on?
I am certain that the GPU is used since I saw activity with nvidia-smi.

Command for profiling

./trtexec --onnx=<model_path.onnx> --int8 --shapes=input_1:1x704x1280x3 --exportTimes=trace.json --dumpProfile --exportProfile=prof.json

Beginning of Profile from trtexec

[
  { "count" : 834 }
, { "name" : "(Unnamed Layer* 0) [Constant] + (Unnamed Layer* 1) [Shuffle] + Mul input reformatter 0", "timeMs" : 21.8493, "averageMs" : 0.0261982, "percentage" : 0.929405 }
, { "name" : "(Unnamed Layer* 0) [Constant] + (Unnamed Layer* 1) [Shuffle] + Mul", "timeMs" : 19.3699, "averageMs" : 0.0232253, "percentage" : 0.823939 }
, { "name" : "PushTranspose_1162", "timeMs" : 51.4201, "averageMs" : 0.0616548, "percentage" : 2.18726 }
, { "name" : "conv2d", "timeMs" : 34.2201, "averageMs" : 0.0410313, "percentage" : 1.45563 }
, { "name" : "leaky_re_lu", "timeMs" : 16.6442, "averageMs" : 0.0199571, "percentage" : 0.707997 }
, { "name" : "conv2d_1", "timeMs" : 28.3778, "averageMs" : 0.0340262, "percentage" : 1.20711 }
, { "name" : "leaky_re_lu_1", "timeMs" : 15.0495, "averageMs" : 0.018045, "percentage" : 0.640163 }

Model Start

Onnx model visualized with Netron:

Environment

TensorRT Version: 7.1.3.4
GPU Type: RTX 2080Ti
Nvidia Driver Version: 460
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.3
Baremetal: Yes

NVES · January 19, 2021, 11:07am

Hi, Request you to share the ONNX model and the script so that we can assist you better.

Alongside you can try validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).

Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!

jean.wanka · January 19, 2021, 11:10am

Hi, thanks for your reply, is there a way to privately share the model, if yes I can provide you with an example onnx model.

jean.wanka · January 19, 2021, 11:23am

The model is valid. As I already described I used trtexec to get the profiling. I will share the model with randomly initialized weights, I just have to export it again, I will get back to you in a couple of hours.

Thanks!

spolisetty · January 20, 2021, 11:09am

Hi @jean.wanka,

Please DM by attaching the model.

Thank you.

jean.wanka · January 20, 2021, 11:17am

Thanks, I shared the model in the DM.
If it is necessary I can also create a script that creates a reduced version of this model and uses keras2onnx to export it.

jean.wanka · January 25, 2021, 6:08pm

Is there any update on this?
The main point I’m trying to understand is what the engine Builder (IBuilder) does in detail and how it rewrites and optimizes the graph.

Is it able to:

remove layers like unnecessary transposes?
rewrite the graph from channel first to an equivalent channel last graph?
fuse layers like Convolution and BatchNorm? Where is it listed what is supported here?

thanks!

spolisetty · January 26, 2021, 6:32am

Hi @jean.wanka,

We have ONNX GraphSurgeon that can modify the onnx file manually.
For the FP16, conv + leakyReLu can be fused together. For Int8 in some case we did not fuse conv and activation together because of register pressure (the extra requested register file will decrease the occupancy).

Thank you.

jean.wanka · January 26, 2021, 9:29am

Hi @spolisetty ,
thanks for the update!

One thing that is still not clear for me is the channels first/last question.
I’ve read in your documentation that channels last (NHWC) is preferred.
ONNX, however, only uses channels first layout, does this mean the tensorrt engine is also always in channels first layout?
Is there a way to change this or are the benefits not significant?

jean.wanka · February 1, 2021, 8:26am

any update on this would be much appreciated.

thanks!

spolisetty · February 1, 2021, 10:09am

Hi @jean.wanka,

TRT engine always doesn’t use channels first layout.
It depends on the kernel implementation, TRT will always insert reformat when the adjacent layers has mismatched kernel I/O.

Thank you.

OnePieceOfDeepLearning · August 3, 2023, 7:00am

@spolisetty
Is it better to use NHWC layout with ONNX to prevent reformat layer?

Topic		Replies	Views
How to handle TensorRT DIM 5 input (NCHW / NHWC)? TensorRT	5	1205	June 8, 2022
Trtexec ignores inputIOFormat with onnx model TensorRT	11	1244	November 19, 2024
Huge speed difference between engines built from scratch and engines built from onnx Jetson AGX Xavier tensorrt , nvbugs	11	846	August 3, 2021
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	921	September 29, 2022
ONNX Model Int64 Weights TensorRT	12	12723	February 17, 2024
TensorRT run ONNX model with Int8 issue TensorRT	9	4123	October 12, 2021
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5198	June 29, 2022
Trtexec can not convert resnet152 onnx to TRT engine, without prompting error! TensorRT	12	1519	July 22, 2021
Same version TensorRT with two methods to convert onnx model，One used trtexec[FAILED] , the other used python[Success] TensorRT	5	678	October 3, 2023
TensorRT 7 ONNX models with variable batch size TensorRT kb	13	11932	October 12, 2021

Does TensorRT rewrite ONNX models to NHWC?

Command for profiling

Beginning of Profile from trtexec

Model Start

Environment

Related topics