Now I get the resnet-50 saved_model by curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz following the steps from here https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a, for tensorflow serving. Then I optimize my this resnet-50 saved_model by running(using docker):
it shows no errors. But the dir /tmp/resnet_trt/1538686847/variables/ is empty, which has no variables.data-00000-of-00001 and variables.index, and there is only a frozen graph file saved_model.pb with bigger size (it looks like this is a TensorRT engine) under the dir /tmp/resnet_trt/1538686847. So this convert failed? how to fix it ?
2019-03-26 11:43:20.122105: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 1
2019-03-26 11:43:20.299920: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_model/TRTEngineOp_0 added for segment 0 consisting of 199 nodes succeeded.
2019-03-26 11:43:20.358346: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph
2019-03-26 11:43:20.358418: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 308 nodes (-112), 363 edges (-114), time = 313.582ms.
2019-03-26 11:43:20.358432: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] layout: Graph size after: 308 nodes (0), 363 edges (0), time = 73.565ms.
2019-03-26 11:43:20.358445: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 308 nodes (0), 363 edges (0), time = 127.602ms.
2019-03-26 11:43:20.358457: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] TensorRTOptimizer: Graph size after: 110 nodes (-198), 157 edges (-206), time = 470.955ms.
But the dir /tmp/resnet_trt/1538686847/variables/ is still empty, which has no variables.data-00000-of-00001 and variables.index, and there still is only a frozen graph file saved_model.pb with bigger size(it looks like this is a TensorRT engine) under the dir /tmp/resnet_trt/1538686847.
@NVES_R yes, you are right, the size of saved_model.pb under /tmp/resnet_trt/1538686847 has bigger size, and it looks like this is a TensorRT engine. I have updated my question.
Then, serve it by tensorflow serving and run python /tmp/resnet/resnet_client.py. It works. Thank you very much. But I am just wondering why only a single frozen graph file can be served by tensorflow serving.
I’m not too sure of Tensorflow serving’s capabilities, but I do know that TensorRT Inference Server can serve a configurable number of instances of multiple models. I would recommend checking it out, it is also available as a container on NGC.
@NVES_R thanks. After I do some tests , I find a problem:
Now I have a saved_model of my model with a another signature_def predict_images in addition to the default signature_def serving_default. but after I do a convert using tensorflow/tensorflow:nightly-gpu docker image:
I find that my new saved_model.pb (under dir /tmp/myModel_trt/111111) optimized tensorrt has no the another signature_def predict_images, and has only the default signature_def serving_default. But if I use nvcr.io/nvidia/tensorflow:19.03-py2 docker image from NGC and run above command, this problem do not appears.
It seems that the inference speed optimized by tensorrt FP32/FP16 is much faster than the original tensorflow, about 30%.
Next I test my own saved_model based resnet-50 which has some up-sampling ops, And you can get my saved_model from here https://github.com/IvyGongoogle/trt_of_tf. Using tensorrt FP32/16 to optimize it and serving it with tensorflow serving, then I use the above same modified resnet_client_grpc.py to test the inference time, the test results for avg_time is below:
First question: For my this model, it is sad that the inference speed optimized tensorrt FP32/FP16 is nearly same with the original tensorflow. What causes this? There is a little ops can be optimized tensorrt for my model which yet is based resnet-50( for resnet-50 we have good effect proved above)?
Second question: For original resnet-50 and my own model, it seems that the inference speed optimized by tensorrt FP16 is only slightly faster than tensorrt FP32, but theoretically this ratio is almost two times?
Can you give some advises? Looking forward your reply