Hi,I’ve tried to optimize GNMT model from FP32 to FP16 to run on TensorRT using create_inference_graph() or TrtGraphConverter(). But, although I set to FP16 in precision_mode, converted graph isn’t applied to FP16.
DTYPE of model is still DT_FLOAT, not DT_HALF. And model size is same before(FP32) and after(FP16).
Why is not precision changed?
My code and environment as below:
- ENV
- Linux version: Ubuntu 16.04
- GPU: Tesla V100
- Nvidia driver version: 410.79
- CUDA version: 10.0
- CUDNN version: 7.5
- Python version: 3.6
- Tensorflow version: 1.14.0
- TensorRT version: 5.1.5.0
- Code
from tensorflow.python.compiler.tensorrt import trt_convert
converter = trt_convert.TrtGraphConverter(input_graph_def=frozen_graph,
nodes_blacklist=[‘softmax_cross_entropy_with_logits_sg/Reshape_2’],
max_batch_size=32,
precision_mode=‘FP16’,
minimum_segment_size=7,
use_calibration=False,
is_dynamic_op=True)
trt_graph = converter.convert()
OR
import tensorflow.contrib.tensorrt as trt
trt_graph = trt.create_inference_graph(input_graph_def = frozen_graph,
outputs = [‘softmax_cross_entropy_with_logits_sg/Reshape_2’],
max_batch_size = 32,
max_workspace_size_bytes = 4096 << 20,
precision_mode= ‘FP16’,
minimum_segment_size = 7,
is_dynamic_op=True)