Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly

Description

Trying to create a TensorRT server on our platform for real time inference that can both accept models created originally by Tensorflow 1.15 and also serve models created by Tensorflow 2.4.1. Since all of the models that were created in TF1.15 were mostly created in tf-slim, models from both versions on our platform are exported as graphdefs. Converting this to TensorRT models was a pretty easy process previously as these graphdefs could be directly converted (using Originally built with nvcr.io/nvidia/tensorflow:20.03-tf1-py3). However with TF-TRT in 2.4.1, we have to convert these to saved-models and then proceed to convert to TensorRT models. By doing this, we face two problems:

  • Previous models that go through this pipeline and are used for predictions always give probabilities with all the classes being 0 and maybe one of them being 1. This is clearly erroneous. The probabilities must be distributed within the output.
  • The speed of the inference has changed quite a bit. The inference time went up from 20 ms to 5 s and 0.170-0.180s. Our systems downstream require faster inference time.

It must be noted that there is a whole different branch for Keras compatibility. This will allow us to skip the graphdef part of the pipeline. But we will lose out on the previous few models and therefore need to fix this version of the pipeline too. No parts of this question is related.to the ongoing Keras efforts.

Environment

TensorRT Version: Not really sure, but it’s using the TF-TRT from TF 2.4.1 now and previously the tensorflow.python.compiler.tensorrt from TF 1.15.

GPU Type: GeForce GTX 1080 Ti

Nvidia Driver Version:

NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

CUDA Version:

CUDA Version: 11.3

CUDNN Version:

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 2
#define CUDNN_PATCHLEVEL 0

Operating System + Version:

NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Python Version (if applicable): 3.6 while training the model, 3.8.5 in the container
TensorFlow Version (if applicable): As stated above 1.15, 2.4.1 for training the models and 2.4.0 within the container to export the model.
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:21.03-py3

Relevant Files

There is an attached zipped file that unzips to the following structure:

nvidia_reproducible/
L__ tf2
    |__ Pipfile
    |__ export_model.py
    |__ export_model_main.py
    |__ models/
        L__ open_images_inception_V3_TF2
            |__ config.ini
            |__ model.ckpt-8000.data-00000-of-00001
            |__ model.ckpt-8000.index
            |__ model.ckpt-8000.meta
            L__ open_images_inception_V3_TF2.pb
        L__ open_images_open_images_inception_V3_TF1_100k
            |__ config.ini
            |__ model.ckpt-100000.data-00000-of-00001
            |__ model.ckpt-100000.index
            |__ model.ckpt-100000.meta
            L__ open_images_open_images_inception_V3_TF1_100k.pb

Link to the compressed folder. In order to run the inference, we need the file check_inference.py. Please install the appropriate libraries at the top of the python script to run this.

open_images_inception_V3_TF2 is the model that was exclusively trained and exported in TF2 and exported using the code given in the compressed folder. open_images_open_images_inception_V3_TF1_100k was trained and exported in code that is TF1, but needs to run with the new TensorRT server.

Steps To Reproduce

Pipenv Environment

Pipfile file exists within the compressed folder for you to check the packages that are being used in the repository. If not necessary, please ignore this file.

Exporting the model from a checkpoint

Within the folder of tf2/ from the compressed folder, the function export_model()from export_model.py was used to convert a checkpoint into a graphdef. This is often done on the cloud or on the ubuntu device itself. The resulting open_images_open_images_inception_V3_TF1_100k.pb from the given checkpoint model.ckpt-100000 is provided within the compressed folder under tf1/ and open_images_open_images_inception_V3_TF1_100k/ and conversely for open_images_inception_V3_TF2

Exporting the tensorRT model

Then the docker image is run with the command:

docker run --gpus all --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -eMODEL_NAME=open_images_open_images_inception_V3_TF1_100k -ti -v<path_to_project>/nvidia_reproducible/tf2/:/trainer nvcr.io/nvidia/tensorflow:21.03-tf2-py3

Once inside the docker, use these commands:

cd /trainer
apt-get update && apt-get install -y libcurl4 libcurl4-openssl-dev
export PYTHONPATH=`pwd`
mkdir /opt/tensorflow/horovod-source/.eggs/
touch /opt/tensorflow/horovod-source/.eggs/easy-install.pth    
pip install tensorflow-probability==0.8
pip install opencv-python-headless
pip install tensorrtserver
pip install tf_slim
pip install nvidia-pyindex
pip install tritonclient
python ./export_model_main.py --model_dir=${MODEL_NAME} --device_query_tool ''
# Now to convert the TF2 model
./export_model_main.py --model_dir=open_images_open_images_inception_V3_TF1_100k --device_query_tool ''
exit

Running the TensorRT server (Triton)

Now to run the server, this command is called:

docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v<path_to_project>/nvidia_reproducible/tf2/models:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/models --log-verbose=1

Running Inference for the TF1 model

Now if we run check_inference.py with NUM_TRIES = 10 and MODEL_NAME = 'open_images_open_images_inception_V3_TF1_100k', we get this output on the server side:

I0522 04:39:54.510930 1 tensorflow.cc:2100] model open_images_open_images_inception_V3_TF1_100k, instance open_images_open_images_inception_V3_TF1_100k_0_1, executing 1 requests
I0522 04:39:54.510965 1 tensorflow.cc:1389] TRITONBACKEND_ModelExecute: Running open_images_open_images_inception_V3_TF1_100k_0_1 with 1 requests
I0522 04:39:54.511341 1 tensorflow.cc:1617] TRITONBACKEND_ModelExecute: input 'input' is GPU tensor: false
I0522 04:39:54.518132 1 infer_response.cc:165] add response output: output: out, type: FP32, shape: [2,601]
I0522 04:39:54.518173 1 http_server.cc:1200] HTTP using buffer for: 'out', size: 4808, addr: 0x7f1f9a21dc60
I0522 04:39:54.518193 1 tensorflow.cc:1800] TRITONBACKEND_ModelExecute: output 'out' is GPU tensor: false
I0522 04:39:54.518245 1 http_server.cc:1215] HTTP release: size 4808, addr 0x7f1f9a21dc60
I0522 04:39:54.518270 1 tensorflow.cc:1858] TRITONBACKEND_ModelExecute: model open_images_open_images_inception_V3_TF1_100k_0_1 released 1 requests
I0522 04:39:54.801961 1 http_server.cc:1229] HTTP request: 2 /v2/models/open_images_open_images_inception_V3_TF1_100k/infer
I0522 04:39:54.802026 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:39:54.802044 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:39:54.898306 1 infer_request.cc:502] prepared: [0x0x7f210c009560] request id: , model: open_images_open_images_inception_V3_TF1_100k, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f210c008098] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
override inputs:
inputs:
[0x0x7f210c008098] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
original requested outputs:
requested outputs:
out

I0522 04:39:54.898440 1 tensorflow.cc:2100] model open_images_open_images_inception_V3_TF1_100k, instance open_images_open_images_inception_V3_TF1_100k_0_1, executing 1 requests
I0522 04:39:54.898478 1 tensorflow.cc:1389] TRITONBACKEND_ModelExecute: Running open_images_open_images_inception_V3_TF1_100k_0_1 with 1 requests
I0522 04:39:54.898838 1 tensorflow.cc:1617] TRITONBACKEND_ModelExecute: input 'input' is GPU tensor: false
I0522 04:39:54.905740 1 infer_response.cc:165] add response output: output: out, type: FP32, shape: [2,601]
I0522 04:39:54.905789 1 http_server.cc:1200] HTTP using buffer for: 'out', size: 4808, addr: 0x7f1f9a21dc60
I0522 04:39:54.905807 1 tensorflow.cc:1800] TRITONBACKEND_ModelExecute: output 'out' is GPU tensor: false
I0522 04:39:54.905860 1 http_server.cc:1215] HTTP release: size 4808, addr 0x7f1f9a21dc60
I0522 04:39:54.905892 1 tensorflow.cc:1858] TRITONBACKEND_ModelExecute: model open_images_open_images_inception_V3_TF1_100k_0_1 released 1 requests

And on the side of check_inference.py, we get:

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
total time is 61.868407249450684
average time is 5.981230926513672
[20.122692584991455, 19.282654523849487, 19.213167190551758, 0.17621302604675293, 0.16901206970214844, 0.17193841934204102, 0.171095609664917, 0.16964077949523926, 0.1677391529083252, 0.16815590858459473]

The second time check_inference was called:

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
total time is 3.9601998329162598
average time is 0.17387678623199462
[0.19607853889465332, 0.1770946979522705, 0.17362165451049805, 0.17151832580566406, 0.17002081871032715, 0.16779732704162598, 0.17086052894592285, 0.16854619979858398, 0.17175626754760742, 0.17147350311279297]

Running Inference for the TF2 model

Now if we run check_inference.py with NUM_TRIES = 10 and MODEL_NAME = 'open_images_inception_V3_TF2', we get this output on the server side:

I0522 04:47:32.612510 1 grpc_server.cc:3427] New request handler for ModelStreamInferHandler, 3
I0522 04:47:32.612555 1 grpc_server.cc:2146] Thread started for ModelStreamInferHandler
I0522 04:47:32.612572 1 grpc_server.cc:3983] Started GRPCInferenceService at 0.0.0.0:8001
I0522 04:47:32.613084 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000
I0522 04:47:32.655274 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002
I0522 04:49:07.003303 1 http_server.cc:1229] HTTP request: 0 /v2/health/live
I0522 04:49:07.274691 1 http_server.cc:1229] HTTP request: 2 /v2/models/open_images_open_images_inception_V3_TF1_100k/infer
I0522 04:49:07.274759 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:49:07.274778 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:49:07.372327 1 infer_request.cc:502] prepared: [0x0x7f5698003cd0] request id: , model: open_images_open_images_inception_V3_TF1_100k, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f5698001f38] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
override inputs:
inputs:
[0x0x7f5698001f38] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
original requested outputs:
requested outputs:
out

I0522 04:49:07.372520 1 tensorflow.cc:2100] model open_images_open_images_inception_V3_TF1_100k, instance open_images_open_images_inception_V3_TF1_100k_0_2, executing 1 requests
I0522 04:49:07.372564 1 tensorflow.cc:1389] TRITONBACKEND_ModelExecute: Running open_images_open_images_inception_V3_TF1_100k_0_2 with 1 requests
I0522 04:49:07.374647 1 tensorflow.cc:1617] TRITONBACKEND_ModelExecute: input 'input' is GPU tensor: false
2021-05-22 04:49:08.245720: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:568] remapper failed: Invalid argument: Mutation::Apply error: multiple nodes with the name: 'InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNormV3/NCHWShapedOffset' exists in Mutation.
2021-05-22 04:49:08.954258: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:568] remapper failed: Invalid argument: Mutation::Apply error: multiple nodes with the name: 'InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/FusedBatchNormV3/NCHWShapedOffset' exists in Mutation.
2021-05-22 04:49:09.254271: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for PartitionedCall_1/PartitionedCall/InceptionV3/TRTEngineOp_0_0 input shapes: [[2,299,299,3]]
2021-05-22 04:49:09.254794: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer_plugin.so.7
I0522 04:49:26.836016 1 infer_response.cc:165] add response output: output: out, type: FP32, shape: [2,601]
I0522 04:49:26.836082 1 http_server.cc:1200] HTTP using buffer for: 'out', size: 4808, addr: 0x7f565fd13ff0
I0522 04:49:26.836104 1 tensorflow.cc:1800] TRITONBACKEND_ModelExecute: output 'out' is GPU tensor: false
I0522 04:49:26.836179 1 http_server.cc:1215] HTTP release: size 4808, addr 0x7f565fd13ff0
I0522 04:49:26.836211 1 tensorflow.cc:1858] TRITONBACKEND_ModelExecute: model open_images_open_images_inception_V3_TF1_100k_0_2 released 1 requests
I0522 04:49:27.121741 1 http_server.cc:1229] HTTP request: 2 /v2/models/open_images_open_images_inception_V3_TF1_100k/infer
I0522 04:49:27.121809 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:49:27.121828 1 model_repository_manager.cc:656] GetInferenceBackend() 'open_images_open_images_inception_V3_TF1_100k' version -1
I0522 04:49:27.220424 1 infer_request.cc:502] prepared: [0x0x7f5698004c90] request id: , model: open_images_open_images_inception_V3_TF1_100k, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f5698004978] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
override inputs:
inputs:
[0x0x7f5698004978] input: input, type: FP32, original shape: [2,299,299,3], batch + shape: [2,299,299,3], shape: [299,299,3]
original requested outputs:
requested outputs:
out

Now if we run check_inference.py The output is:

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
total time is 61.49000334739685
average time is 5.935466170310974
[19.63507580757141, 0.17602181434631348, 19.014800310134888, 0.17351651191711426, 0.17123889923095703, 19.48599362373352, 0.17773032188415527, 0.17619085311889648, 0.17372965812683105, 0.1703639030456543]

Now if we run check_inference.py the second time, the output is:

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
total time is 3.957824230194092
average time is 0.17999637126922607
[0.23171639442443848, 0.1756727695465088, 0.17092442512512207, 0.17671537399291992, 0.17761540412902832, 0.16989755630493164, 0.1754920482635498, 0.17484045028686523, 0.1747884750366211, 0.1723008155822754]

Hi,
We recommend you to check the below samples links, as they might answer your concern

If issue persist, request you to share the model and script so that we can try reproducing the issue at our end.
Thanks!

Hi, problem is persisting. I have attached the script, the model and all the necessary files to the question.

Hi @sharan,

We recommend you to please post your concern on Triton forum to get better help.

Thank you.

I believe this to be more of a TensorRT issue, but I will post it on the Triton Forum too.

Hi @sharan,

Sorry for the delayed response. As you’re using Triton inference server to deploy model, we requested you to post in Triton forum to get better assistance if it is Triton related issue.
We recommend you to test inference normally (without Triton deployment) and share us issue reproducible model, scripts steps if you still face this issue.

Thank you.

@sharan It appears that this issue has nothing to do with any NVIDIA product as far as we can tell:

Containers used in this report:

1. The checkpoints provided can not be loaded:

Not being able to load the checkpoint is not an issue in itself, however it prevents us from being able to regenerate the SavedModel.

import os
import tensorflow as tf

if int(tf.version.VERSION[0]) == 2:
    tftrt_checkpoint_file = os.path.join(
        "open_images_inception_V3_TF2", "model.ckpt-8000")
else:
    tftrt_checkpoint_file = os.path.join(
        "open_images_open_images_inception_V3_TF1_100k", "model.ckpt-100000")


# Add ops to save and restore all the variables.
with tf.compat.v1.Session(graph=tf.Graph()) as sess:
    new_saver = tf.compat.v1.train.import_meta_graph(
        '%s.meta' % tftrt_checkpoint_file
    )
    new_saver.restore(sess, tftrt_checkpoint_file)

####################### OUTPUT IN TF1 CONTAINER #######################

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘TPUReplicatedInput’ used by node input0 (defined at /usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [T=DT_INT32, N=8] Registered devices: [CPU, GPU, XLA_CPU, XLA_GPU] Registered kernels: [[input0]]

####################### OUTPUT IN TF2 CONTAINER #######################

Traceback (most recent call last): File “”, line 2, in File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py”, line 1465, in import_meta_graph return _import_meta_graph_with_return_elements(meta_graph_or_file, File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/saver.py”, line 1486, in _import_meta_graph_with_return_elements meta_graph.import_scoped_meta_graph_with_return_elements( File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/meta_graph.py”, line 887, in import_scoped_meta_graph_with_return_elements col_op = graph.as_graph_element( File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py”, line 3755, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py”, line 3795, in _as_graph_element_locked raise KeyError("The name %s refers to a Tensor which does not " KeyError: “The name ‘aux_loss/value:0’ refers to a Tensor which does not exist. The operation, ‘aux_loss/value’, does not exist in the graph.”

2. The SavedModel provided can not be loaded:

The SavedModel is buggy and returns “1” systematically for the same class whatever the input is.
This has nothing to do with TF-TRT nor Triton since the issue can be reproduced with pure Tensorflow code:

inference.py

import os
import shutil

import numpy as np
import tensorflow as tf

from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.framework import convert_to_constants


# Usage:
# - python inference.py --v1 --use_native_tf
# - python inference.py --v1 --use_tf_trt
# - python inference.py --v2 --use_native_tf
# - python inference.py --v2 --use_tf_trt


if int(tf.version.VERSION[0]) == 1:
    tf.compat.v1.enable_eager_execution()


def _get_func_from_saved_model(saved_model_dir):
    saved_model_loaded = tf.compat.v2.saved_model.load(
        saved_model_dir, tags=[tag_constants.SERVING])
    _graph_func = saved_model_loaded.signatures[
        signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
    _graph_func = convert_to_constants.convert_variables_to_constants_v2(
        _graph_func
    )

    return _graph_func


def get_graph_func(input_saved_model_dir,
                   output_saved_model_dir=None,
                   use_trt=False):
    """Retreives a frozen SavedModel and applies TF-TRT
    use_trt: bool, if true use TensorRT
    returns: TF function that is ready to run for inference
    """

    saved_model_dir = input_saved_model_dir

    if use_trt:

        conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS

        conversion_params = conversion_params._replace(
            max_workspace_size_bytes=(1 << 32),
            precision_mode="FP32",
            maximum_cached_engines=100,
            minimum_segment_size=3
        )

        converter = trt.TrtGraphConverterV2(
            input_saved_model_dir=input_saved_model_dir,
            conversion_params=conversion_params,
        )

        converter.convert()

        try:
            saved_model_dir = output_saved_model_dir
            shutil.rmtree(output_saved_model_dir)
        except FileNotFoundError:
            pass

        converter.save(output_saved_model_dir=saved_model_dir)

    graph_func = _get_func_from_saved_model(saved_model_dir)

    return graph_func


def get_inference_dataset(batch_size=1, num_batches=1):
    np_data = np.random.random((batch_size*num_batches, 299, 299, 3)).astype(
        np.float32
    )
    ds = tf.data.Dataset.from_tensor_slices(np_data)
    ds = ds.batch(batch_size)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds


if __name__ == "__main__":


    def define_args():

        import argparse
        parser = argparse.ArgumentParser('joyful_panda')

        tf_version_parser = parser.add_mutually_exclusive_group(required=True)
        tf_version_parser.add_argument("--v1", dest="USE_TF_V2", action="store_false", default=False)
        tf_version_parser.add_argument("--v2", dest="USE_TF_V2", action="store_true")
        tf_version_parser.set_defaults(name=False)

        tf_trt_parser = parser.add_mutually_exclusive_group(required=True)
        tf_trt_parser.add_argument("--use_native_tf", dest="USE_TF_TRT", action="store_false", default=False)
        tf_trt_parser.add_argument("--use_tf_trt", dest="USE_TF_TRT", action="store_true")
        tf_trt_parser.set_defaults(name=False)

        _flags, unknown_args = parser.parse_known_args()

        if len(unknown_args) > 0:

            for bad_arg in unknown_args:
                print("ERROR: Unknown command line arg: %s" % bad_arg)

            raise ValueError("Invalid command line arg(s)")

        return _flags

    FLAGS = define_args()

    BATCH_SIZE = 10
    NUM_BATCHES = 1

    if FLAGS.USE_TF_V2:
        SAVED_MODEL_PATH = "open_images_inception_V3_TF2/saved"
    else:
        SAVED_MODEL_PATH = "open_images_open_images_inception_V3_TF1_100k/saved"

    concrete_func = get_graph_func(
        input_saved_model_dir=SAVED_MODEL_PATH,
        output_saved_model_dir="saved_models/tf_trt",
        use_trt=FLAGS.USE_TF_TRT
    )
    for step_idx, data_batch in enumerate(get_inference_dataset(
        BATCH_SIZE, NUM_BATCHES
    )):
        classes = tf.math.argmax(concrete_func(data_batch)[0], axis=1)
        print("================ BATCH: %02d =================" % (step_idx + 1))
        print("Predicted Classes:", classes)


####################### OUTPUT IN TF1 CONTAINER #######################
# python inference.py --v1 --use_native_tf

================ BATCH: 01 =================
Predicted Classes: tf.Tensor([68 68 68 68 68 68 68 68 68 68], shape=(10,), dtype=int64)

####################### OUTPUT IN TF2 CONTAINER #######################
# python inference.py --v2 --use_native_tf

================ BATCH: 01 =================
Predicted Classes: tf.Tensor([229 229 229 229 229 229 229 229 229 229], shape=(10,), dtype=int64)

Unless the user fixes their code and model, there’s nothing we can do. We’ll stop investigating this bug until further evidence that this comes from our product stack.