Hello, I get the resnet-50 saved_model by curl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/resnet -xvz
following the steps from https://medium.com/tensorflow/optimizing-tensorflow-serving-performance-with-nvidia-tensorrt-6d8a2347869a. Then I use nvcr.io/nvidia/tensorflow:19.03-py2
to optimize this resnet-50 saved_model by running(using docker):
docker run --rm --runtime=nvidia -it --env CUDA_VISIBLE_DEVICES=2 -v /tmp:/tmp nvcr.io/nvidia/tensorflow:19.03-py2 /usr/local/bin/saved_model_cli convert --dir /tmp/resnet/1538687457 --output_dir /tmp/resnet_trt/1538687457 --tag_set serve tensorrt --precision_mode FP32 --max_batch_size 1 --is_dynamic_op True
It shows info:
2019-04-15 08:41:36.445773: I tensorflow/contrib/tensorrt/segment/segment.cc:461] There are 70 ops of 36 different types in the graph that are not converted to TensorRT: ArgMax, Exit, NextIteration, TensorArrayWriteV3, Slice, FloorDiv, Softmax, Squeeze, Pack, Range, Sub, Minimum, TensorArraySizeV3, Less, DecodeJpeg, Merge, ResizeBilinear, TensorArrayV3, TensorArrayScatterV3, Shape, Enter, NoOp, LoopCond, StridedSlice, TensorArrayReadV3, Transpose, LogicalAnd, TensorArrayGatherV3, Switch, Identity, Cast, Placeholder, Add, RealDiv, Mul, ExpandDims, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-04-15 08:41:36.496972: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 1
2019-04-15 08:41:36.910235: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_model/TRTEngineOp_0 added for segment 0 consisting of 441 nodes succeeded.
2019-04-15 08:41:37.038405: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph
2019-04-15 08:41:37.038483: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 550 nodes (-256), 613 edges (-258), time = 617.799ms.
2019-04-15 08:41:37.038504: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] layout: Graph size after: 557 nodes (7), 615 edges (2), time = 160.261ms.
2019-04-15 08:41:37.038517: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 552 nodes (-5), 615 edges (0), time = 479.515ms.
2019-04-15 08:41:37.038533: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] TensorRTOptimizer: Graph size after: 112 nodes (-440), 159 edges (-456), time = 824.339ms.
As you can see, some ops like Softmax
, Sub
, Add
, Identity
, Mul
, Slice
and others are not converted to TensorRT, however these tensorflow ops are supported in docs from https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#supported-ops. So what causes this? Anyone can give some advises?