inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt

Hi,

I tried trt.create_inference_graph with 2 options “FP32” and “FP16” and convert input between FP32 and FP16.
But I saw that the inference speed not improve. So In which condition the precision_mode would help?

Thanks.

Hi,

It’s recommended to check how many layers is accelerated with TensorRT first.
If most of the layer uses the TensorFlow implementation, FP32 and FP16 will have similar performance.

You can find this information in the TensorFlow log:

2019-03-12 05:28:27.472767: I tensorflow/contrib/tensorrt/segment/segment.cc:461] There are 2329 ops of 32 different types in the graph that are not converted to TensorRT: Fill, Merge, Switch, Range, ConcatV2, ZerosLike, Identity, NonMaxSuppressionV3, Minimum, StridedSlice, Shape, Split, Where, Exp, ExpandDims, Unpack, GatherV2, NoOp, TopKV2, Cast, Placeholder, Mul, Pack, Reshape, ResizeBilinear, Squeeze, Add, Greater, Const, Sub, Transpose, Slice, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-03-12 05:28:27.834501: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 1
2019-03-12 05:29:04.491687: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 434 nodes succeeded.
2019-03-12 05:29:04.690552: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:581] Optimization results for grappler item: tf_graph
2019-03-12 05:29:04.690753: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 6502 nodes (-1051), 8564 edges (-1660), time = 1286.46204ms.
2019-03-12 05:29:04.690883: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   layout: Graph size after: 6517 nodes (15), 8590 edges (26), time = 412.909ms.
2019-03-12 05:29:04.690958: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 6517 nodes (0), 8590 edges (0), time = 445.286ms.
2019-03-12 05:29:04.691023: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   TensorRTOptimizer: Graph size after: 6084 nodes (-433), 8088 edges (-502), time = 37475.3047ms.

Thanks.

Hi,

Do you know how to count number of ops in TF. I found this scrip but not sure it is correct.
Could you check this

flops = tf.profiler.profile(tf.get_default_graph(), options=tf.profiler.ProfileOptionBuilder.float_operation())
            print('FLOP = ', flops.total_float_ops)

Thanks.

Hi,

Try this:

tf_sess.graph.get_operations()

Thanks.