little speed up when convert fp32(frozen model) to fp16 in yolo-v3 inference

zhaoshuhao · September 12, 2019, 3:22am

Hi,I used create_inference_graph to convert fp32(frozen model) to fp16, but the GPU memory is higher than fp32(fp16:17911MB, fp32:5445MB), the predict speed is 0.036s/img(fp16),while fp32 is 0.039s/img， is this normal?

my convert code is below:

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
graph = tf.Graph()
self.sess = tf.Session(config=config)
with tf.gfile.GFile(‘./yolov3_coco.pb’, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return_elements = [“input/input_data:0”, “pred_sbbox/concat_2:0”, “pred_mbbox/concat_2:0”, “pred_lbbox/concat_2:0”]
trt_graph = trt.create_inference_graph(
input_graph_def=graph_def,
outputs=return_elements,
max_batch_size=32,
max_workspace_size_bytes=2 << 20,
is_dynamic_op=True,
precision_mode=‘FP16’)

    self.return_tensors = tf.import_graph_def(
        trt_graph,
        return_elements=return_elements)

Topic		Replies	Views
TRT inference fp32 vs fp16 TensorRT	4	2937	June 17, 2020
Graph conversion to FP16 not working TensorRT	5	1703	February 13, 2019
tensorRT converted graph doubled in size. why? TensorRT	6	889	June 19, 2020
TensorRT Inferencing using TF-TRT framework FP32 vs FP16 Jetson AGX Orin tensorrt	5	577	June 3, 2024
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	3	830	March 19, 2019
Tensorrt can not speed up well TensorRT	7	1837	June 29, 2022
Tensorflow inference using TRT converted model TensorRT	10	1191	May 25, 2021
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1679	September 8, 2022
Float16 does not halve existing compared to Float32 ？ TensorRT tensorrt	1	583	October 28, 2020
TensorRT model is double size TensorRT	3	716	April 27, 2020

little speed up when convert fp32(frozen model) to fp16 in yolo-v3 inference

Related topics