some questions about cudnn7.2?

In the website: https://developer.nvidia.com/cudnn ,i saw the description that V100 + cudnn7.2(mixed) can process about 700+ images/sec, and V100 + cudnn7.0(mixed) can only process about 200+ images/sec, i test in my V100 with cudnn7.0 and cudnn7.2 with tensorflow benchmark:https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks, but the result is very different with above. My result is below, only no use fp16, change cudnn7.0 to 7.2 can be 399 -->>433 images / sec.

But why ? Who can tell me? THx!

cmd:
CUDA_VISIBLE_DEVICES=‘7’ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server --use_fp16

My result:
cudnn 7.0:
total images/sec: 630.29

cudnn 7.2(modify CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION in tensorflow source code):
total images/sec: 643.68

cmd:
CUDA_VISIBLE_DEVICES=‘7’ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server

My result:
cudnn 7.0:
total images/sec: 339.56

cudnn 7.2(modify CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION in tensorflow source code):
total images/sec: 433.82