cannot convert from a tensorflow saved_model to a saved_model optimized by tensorrt

86108429 · March 23, 2019, 11:00am

Now I get the resnet-50 saved_model by curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz following the steps from here https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a, for tensorflow serving. Then I optimize my this resnet-50 saved_model by running(using docker):

docker run --rm --runtime=nvidia -it \
    -v /tmp:/tmp tensorflow/tensorflow:nightly-gpu \
    /usr/local/bin/saved_model_cli convert \
    --dir /tmp/resnet/1538686847 \
    --output_dir /tmp/resnet_trt/1538686847 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 1 --is_dynamic_op True

it shows no errors. But the dir /tmp/resnet_trt/1538686847/variables/ is empty, which has no variables.data-00000-of-00001 and variables.index, and there is only a frozen graph file saved_model.pb with bigger size (it looks like this is a TensorRT engine) under the dir /tmp/resnet_trt/1538686847. So this convert failed? how to fix it ?

looking forward to your reply.

alexander.spivakovsky · March 24, 2019, 9:43am

You need to convert your tensorflow pb graph to a UFF format first.

86108429 · March 25, 2019, 3:54am

@alexander.spivakovsky

I read the tutorial[url]https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#integrate-ovr[/url], but there is no the need of converting tensorflow pb graph to a UFF format first. it seems that we just need to use the function create_inference_graph [url]tensorflow/trt_convert.py at master · tensorflow/tensorflow · GitHub for saved_model, pb or ckpt.

NVES_R · March 25, 2019, 6:15pm

Hi,

For future reference, tensorflow/tensorflow:nightly-gpu is not one of our containers. We can better support containers from https://ngc.nvidia.com, such as this one: [url]https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow[/url]

Thanks,
NVIDIA Enterprise Support

86108429 · March 26, 2019, 11:50am

@NVES_R thanks for your reply. After I pull nvcr.io/nvidia/tensorflow:19.03-py2 from urlhttps://ngc.nvidia.com/catalog/containers/nvidia:tensorflow, I run

docker run --rm --runtime=nvidia -it --env CUDA_VISIBLE_DEVICES=2 -v /tmp:/tmp nvcr.io/nvidia/tensorflow:19.03-py2 /usr/local/bin/saved_model_cli convert --dir /tmp/resnet/1538686847 --output_dir /tmp/resnet_trt/1538686847 --tag_set serve tensorrt --precision_mode FP32 --max_batch_size 1 --is_dynamic_op True

it seems to do some successful convert in log:

2019-03-26 11:43:20.122105: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 1
2019-03-26 11:43:20.299920: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_model/TRTEngineOp_0 added for segment 0 consisting of 199 nodes succeeded.
2019-03-26 11:43:20.358346: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph
2019-03-26 11:43:20.358418: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618]   constant folding: Graph size after: 308 nodes (-112), 363 edges (-114), time = 313.582ms.
2019-03-26 11:43:20.358432: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618]   layout: Graph size after: 308 nodes (0), 363 edges (0), time = 73.565ms.
2019-03-26 11:43:20.358445: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618]   constant folding: Graph size after: 308 nodes (0), 363 edges (0), time = 127.602ms.
2019-03-26 11:43:20.358457: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618]   TensorRTOptimizer: Graph size after: 110 nodes (-198), 157 edges (-206), time = 470.955ms.

But the dir /tmp/resnet_trt/1538686847/variables/ is still empty, which has no variables.data-00000-of-00001 and variables.index, and there still is only a frozen graph file saved_model.pb with bigger size(it looks like this is a TensorRT engine) under the dir /tmp/resnet_trt/1538686847.

NVES_R · March 26, 2019, 6:11pm

Hi,

Unfortunately, this is a Tensorflow API, so TensorRT doesn’t maintain documentation on saved_model_cli. Maybe this will help: https://www.tensorflow.org/guide/saved_model?

However, for what it’s worth - I followed the steps from here: https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a, and the output is still named saved_model.pb, but the size has changed and it looks like this is a TensorRT engine (just poorly named by saved_model_cli). Also, the variables directory is empty as well. So it seems like this is expected.

If you try to serve it using the steps from the article

docker run --rm --runtime=nvidia -p 8501:8501 \
    --name tfserving_resnet \
    -v /tmp/resnet_trt:/models/resnet \
    -e MODEL_NAME=resnet \
    -t tensorflow/serving:latest-gpu &

python /tmp/resnet/resnet_client.py

It seems to be working.

Thanks,
NVIDIA Enterprise Support

86108429 · March 27, 2019, 7:23am

@NVES_R yes, you are right, the size of saved_model.pb under /tmp/resnet_trt/1538686847 has bigger size, and it looks like this is a TensorRT engine. I have updated my question.
Then, serve it by tensorflow serving and run python /tmp/resnet/resnet_client.py. It works. Thank you very much. But I am just wondering why only a single frozen graph file can be served by tensorflow serving.

NVES_R · March 27, 2019, 7:34am

Hi,

I’m not too sure of Tensorflow serving’s capabilities, but I do know that TensorRT Inference Server can serve a configurable number of instances of multiple models. I would recommend checking it out, it is also available as a container on NGC.

Thanks,
NVIDIA Enterprise Support

86108429 · March 27, 2019, 1:55pm

@NVES_R thanks. After I do some tests , I find a problem:
Now I have a saved_model of my model with a another signature_def predict_images in addition to the default signature_def serving_default. but after I do a convert using tensorflow/tensorflow:nightly-gpu docker image:

docker run --rm --runtime=nvidia -it --env CUDA_VISIBLE_DEVICES=2 -v /tmp:/tmp tensorflow/tensorflow:nightly-gpu /usr/local/bin/saved_model_cli convert --dir /tmp/myModel/111111 --output_dir /tmp/myModel_trt/111111 --tag_set serve tensorrt --precision_mode FP32 --max_batch_size 1 --is_dynamic_op True

I find that my new saved_model.pb (under dir /tmp/myModel_trt/111111) optimized tensorrt has no the another signature_def predict_images, and has only the default signature_def serving_default. But if I use nvcr.io/nvidia/tensorflow:19.03-py2 docker image from NGC and run above command, this problem do not appears.

thank you very much~

NVES_R · March 27, 2019, 9:23pm

Hi,

I’m sorry to hear that, but we do not support the tensorflow/tensorflow:nightly-gpu image.

If the nvcr.io/nvidia/tensorflow:19.03-py2 image is working for you, then I’d suggest using that, because we can better support that image.

Thanks,
NVIDIA Enterprise Support

86108429 · March 28, 2019, 7:08am

@NVES_R thanks for your reply. Sorry bother you again. Now I have another two questions:
I use the saved_model from here https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz, and use tensorrt to optimize this saved_model following the steps https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a. Then I modify the resnet_client_grpc.py https://github.com/tensorflow/serving/blob/master/tensorflow_serving/example/resnet_client_grpc.py to test the inference time:

...
    request.model_spec.name = 'resnet'
    request.model_spec.signature_name = 'predict'
    request.inputs['image_bytes'].CopyFrom(
    tf.contrib.util.make_tensor_proto(data, shape=[1]))
    
    times=[]
    for count in range(1000):
        start = time.time()
        result = stub.Predict(request, 30.0)  # 10 secs timeout
        end = time.time()
        times.append((end-start)*1000.0)
        
    avg_time = np.mean(times[100:]) # don't include first 100 run
    print("predict image cat.jpg avg_time:%f\n" %avg_time)

My test results for avg_time is below:

original tensorflow saved_model: 10.38/10.49/10.39(ms)
tf saved_model optimized tensorrt FP32: 7.72/7.76/7.79(ms) 
tf saved_model optimized tensorrt FP16: 7.19/7.16/7.21(ms)

It seems that the inference speed optimized by tensorrt FP32/FP16 is much faster than the original tensorflow, about 30%.

Next I test my own saved_model based resnet-50 which has some up-sampling ops, And you can get my saved_model from here https://github.com/IvyGongoogle/trt_of_tf. Using tensorrt FP32/16 to optimize it and serving it with tensorflow serving, then I use the above same modified resnet_client_grpc.py to test the inference time, the test results for avg_time is below:

original tensorflow saved_model: 48.65/48.74/48.96(ms)
tf saved_model optimized tensorrt FP32: 47.46/47.33/47.19(ms) 
tf saved_model optimized tensorrt FP16: 43.85/43.77/43.84(ms)

First question: For my this model, it is sad that the inference speed optimized tensorrt FP32/FP16 is nearly same with the original tensorflow. What causes this? There is a little ops can be optimized tensorrt for my model which yet is based resnet-50( for resnet-50 we have good effect proved above)?

Second question: For original resnet-50 and my own model, it seems that the inference speed optimized by tensorrt FP16 is only slightly faster than tensorrt FP32, but theoretically this ratio is almost two times?

Can you give some advises? Looking forward your reply

Topic		Replies	Views
supported ops problem for Tensorflow-TensorRT Frameworks tensorflow	6	1640	July 11, 2019
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2811	November 15, 2019
Tensorrt fails for custom ssd_inception Model TensorRT	18	2803	May 14, 2020
TensorRT4 and TF1.12, Python - runtime difference between savedmodel and checkpoints frozen graph TensorRT	9	1983	October 12, 2021
Failure in verifying input shapes: Input shapes are inconsistent on the batch dimension TensorRT	8	1196	July 11, 2021
TF-TRT INT8 Failing to convert due to no calibration TensorRT	3	1383	April 2, 2019
TensorRT UFF parser register_input() cannot handle original graph in NHWC format TensorRT	7	2418	March 6, 2019
Getting error while converting custom model using faster rcnn resnet 50 to tensor rt engine using tensor rt 5.0 TensorRT	17	3473	September 26, 2020
ValueError: Node... Axis is not unique while converting tensorflow segmentation model to tensorrt TensorRT tensorrt , segmentation	3	1665	March 9, 2022
Trying to run TensorFlow 1.15 produced graphdefs with TF2 based tensorRT but TensorRT model is not building correctly TensorRT	6	991	July 15, 2021

cannot convert from a tensorflow saved_model to a saved_model optimized by tensorrt

Related topics