We are training with our convolutional networks tensorflow 2.3 and are exporting our models to onnx using keras2onnx.
A visualization of the beginning of the onnx model can be seen below.
The input is in NHWC, but since onnx uses NCHW it adds a transpose layer before the convolutions.
I would expect that tensorrt removes this transpose layer and executes the convolutions with NHWC on GPUs.
However, when profiling with trtexec it shows a PushTranspose Layer (see below) that also consumes time.
Does this mean the convolutions are indeed executed with NCHW or how can I know what is going on?
I am certain that the GPU is used since I saw activity with nvidia-smi
.
Command for profiling
./trtexec --onnx=<model_path.onnx> --int8 --shapes=input_1:1x704x1280x3 --exportTimes=trace.json --dumpProfile --exportProfile=prof.json
Beginning of Profile from trtexec
[
{ "count" : 834 }
, { "name" : "(Unnamed Layer* 0) [Constant] + (Unnamed Layer* 1) [Shuffle] + Mul input reformatter 0", "timeMs" : 21.8493, "averageMs" : 0.0261982, "percentage" : 0.929405 }
, { "name" : "(Unnamed Layer* 0) [Constant] + (Unnamed Layer* 1) [Shuffle] + Mul", "timeMs" : 19.3699, "averageMs" : 0.0232253, "percentage" : 0.823939 }
, { "name" : "PushTranspose_1162", "timeMs" : 51.4201, "averageMs" : 0.0616548, "percentage" : 2.18726 }
, { "name" : "conv2d", "timeMs" : 34.2201, "averageMs" : 0.0410313, "percentage" : 1.45563 }
, { "name" : "leaky_re_lu", "timeMs" : 16.6442, "averageMs" : 0.0199571, "percentage" : 0.707997 }
, { "name" : "conv2d_1", "timeMs" : 28.3778, "averageMs" : 0.0340262, "percentage" : 1.20711 }
, { "name" : "leaky_re_lu_1", "timeMs" : 15.0495, "averageMs" : 0.018045, "percentage" : 0.640163 }
Model Start
Onnx model visualized with Netron:
Environment
TensorRT Version: 7.1.3.4
GPU Type: RTX 2080Ti
Nvidia Driver Version: 460
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.3
Baremetal: Yes