Bert based model optimization with tf-trt on tf v 1.15.2 and tensorrt 5.1

Description

I would like to optimize a BERT based tensorflow model that I have trained on tf 1.15.2. It’s a base BERT with a logistic regression layer on top of it

I am trying to optimize using tf-trt to start with, and I have exported the model in a pb file using tensorflow.

Environment

TensorRT Version:
GPU Type: VX100
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version:
Operating System + Version: ubuntu 1804
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 1.15.2
PyTorch Version (if applicable): n/a
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:19.08-py3

Steps To Reproduce

I am running the following commands:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
2020-06-02 18:56:32.762315: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2020-06-02 18:56:32.763013: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5

converter = trt.TrtGraphConverter(input_saved_model_dir="./1588775388")
2020-06-02 18:56:35.956493: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
INFO:tensorflow:Linked TensorRT version: (5, 1, 5)
INFO:tensorflow:Loaded TensorRT version: (5, 1, 5)
INFO:tensorflow:Running against TensorRT version 5.1.5

converter.convert()
After a while I get the following;

2020-06-02 18:56:58.223777: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-06-02 18:56:58.223852: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can’t identify the cuda device. Running on device 0
2020-06-02 18:56:58.223959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.5
2020-06-02 18:56:58.413018: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.5
2020-06-02 18:57:03.623745: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 27 nodes succeeded.

and at the end I get this

2020-06-02 18:57:16.998789: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 45 nodes (0), 48 edges (0), time = 35.247ms.
2020-06-02 18:57:16.998804: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] TensorRTOptimizer: Graph size after: 45 nodes (0), 48 edges (0), time = 3.794ms.
2020-06-02 18:57:16.998815: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 45 nodes (0), 48 edges (0), time = 36.346ms.
[libprotobuf ERROR google/protobuf/io/zero_copy_stream_impl_lite.cc:155] Cannot allocate buffer larger than kint32max for StringOutputStream.

==============

Could you help me figure out the issue?

Also is there any other avenue I could follow top optimize the model, other than tf-trt?

Can you try solution recommended in below link:

You can use TensorRT to get better performance.
The workflow will be .pb -> ONNX -> TRT. If any layer is not supported, you need to create a custom plugin.

Thanks

I managed to optimize using tf2onnx and then from onnx to tensorrt. Thanks for your help!