little speed up when convert fp32(frozen model) to fp16 in yolo-v3 inference

Hi,I used create_inference_graph to convert fp32(frozen model) to fp16, but the GPU memory is higher than fp32(fp16:17911MB, fp32:5445MB), the predict speed is 0.036s/img(fp16),while fp32 is 0.039s/img, is this normal?

my convert code is below:

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
graph = tf.Graph()
self.sess = tf.Session(config=config)
with tf.gfile.GFile(‘./yolov3_coco.pb’, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return_elements = [“input/input_data:0”, “pred_sbbox/concat_2:0”, “pred_mbbox/concat_2:0”, “pred_lbbox/concat_2:0”]
trt_graph = trt.create_inference_graph(
input_graph_def=graph_def,
outputs=return_elements,
max_batch_size=32,
max_workspace_size_bytes=2 << 20,
is_dynamic_op=True,
precision_mode=‘FP16’)

    self.return_tensors = tf.import_graph_def(
        trt_graph,
        return_elements=return_elements)