An FCN with 360K parameters much slower compared to ResNet50 with 23M parameters

Description

Generated an ONNX model from TensorFlow 2.0. trtexec gives 7.45ms@fp32 for a simple FCN model with 7 conv2d layers, 360K parameters, 1x256x256x3 input. The output is 1x256x256x1. On the other hand, ResNet-50 (official NVIDIA ONNX model) gives 3.45ms@fp32 for 23M parameters, 1x224x224x3 input. Both measured on a system with T4. Going over the log files, tactics picked by TensorRT seem to be much slower for my model compared to tactics picked for ResNet50.

Environment

TensorRT Version: 7.0
GPU Type: T4
Nvidia Driver Version: 440.64
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable): 2.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): NVIDIA TensorRT 20.03

Hi,

Could you please check the performance of generated ONNX model before converting to TRT engine?
Also, if possible please share the model/script file so that we can help better.

Thanks

In what way should I measure the performance? directly from TensorFlow?

My FCN model was created in TF 2.0. I first froze the exported pb file according to the instruction here:


and then used tf2onnx to produce an ONNX model. I was not able to generate the ONNX model directly from the saved model format. I was not able as well to use TF-TRT to measure performance directly from TF 2.0 (following the instructions in https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#worflow-with-savedmodel) - I keep receiving errors regarding “tag_constants.SERVING”.

Here is the TF 2.0 code for my FCN with 7 conv2d layers:

image_dim = 256
input_shape_image = (image_dim, image_dim,3)
input_image = Input(shape=input_shape_image, name=‘image’)

n1 = Conv2D(64,(3,3), padding=“same”, activation=‘relu’)(input_image)
n2 = Conv2D(128,(3,3), padding=“same”, activation=‘relu’)(n1)
n3 = Conv2D(128,(3,3), padding=“same”, activation=‘relu’)(n2)
n4 = Conv2D(64,(3,3), padding=“same”, activation=‘relu’)(n3)
n5 = Conv2D(64,(3,3), padding=“same”, activation=‘relu’)(n4)
n6 = Conv2D(32,(3,3), padding=“same”, activation=‘relu’)(n5)
n7 = Conv2D(16,(3,3), padding=“same”, activation=‘relu’)(n6)
g = Conv2D(1,(3,3), padding=“same”, activation=‘sigmoid’)(n7)

and here is the trtexec log file: model_fcn.log (93.4 KB)
Here is a link to the ONNX model: https://www.dropbox.com/s/ygkvrluoakvvlqp/model_fcn.onnx?dl=0

Model summary:


Layer (type) Output Shape Param #


image (InputLayer) [(None, 256, 256, 3)] 0


conv2d (Conv2D) (None, 256, 256, 64) 1792


conv2d_1 (Conv2D) (None, 256, 256, 128) 73856


conv2d_2 (Conv2D) (None, 256, 256, 128) 147584


conv2d_3 (Conv2D) (None, 256, 256, 64) 73792


conv2d_4 (Conv2D) (None, 256, 256, 64) 36928


conv2d_5 (Conv2D) (None, 256, 256, 32) 18464


conv2d_6 (Conv2D) (None, 256, 256, 16) 4624


conv2d_7 (Conv2D) (None, 256, 256, 1) 145


Total params: 357,185
Trainable params: 357,185
Non-trainable params: 0


Yes, can you try running the TF and ONNX model to compare the performance of model before and after conversion?

Thanks