Precision is NOT changed by TensorRT graph convert API.

Hi,I’ve tried to optimize GNMT model from FP32 to FP16 to run on TensorRT using create_inference_graph() or TrtGraphConverter(). But, although I set to FP16 in precision_mode, converted graph isn’t applied to FP16.
DTYPE of model is still DT_FLOAT, not DT_HALF. And model size is same before(FP32) and after(FP16).
Why is not precision changed?


My code and environment as below:

  1. ENV
  1. Linux version: Ubuntu 16.04
  2. GPU: Tesla V100
  3. Nvidia driver version: 410.79
  4. CUDA version: 10.0
  5. CUDNN version: 7.5
  6. Python version: 3.6
  7. Tensorflow version: 1.14.0
  8. TensorRT version: 5.1.5.0
  1. Code
    from tensorflow.python.compiler.tensorrt import trt_convert
    converter = trt_convert.TrtGraphConverter(input_graph_def=frozen_graph,
    nodes_blacklist=[‘softmax_cross_entropy_with_logits_sg/Reshape_2’],
    max_batch_size=32,
    precision_mode=‘FP16’,
    minimum_segment_size=7,
    use_calibration=False,
    is_dynamic_op=True)
    trt_graph = converter.convert()
OR

import tensorflow.contrib.tensorrt as trt
trt_graph = trt.create_inference_graph(input_graph_def = frozen_graph,
outputs = [‘softmax_cross_entropy_with_logits_sg/Reshape_2’],
max_batch_size = 32,
max_workspace_size_bytes = 4096 << 20,
precision_mode= ‘FP16’,
minimum_segment_size = 7,
is_dynamic_op=True)