In the website: [url]https://developer.nvidia.com/cudnn[/url] ,i saw the description that V100 + cudnn7.2(mixed) can process about 700+ images/sec, and V100 + cudnn7.0(mixed) can only process about 200+ images/sec, i test in my V100 with cudnn7.0 and cudnn7.2 with tensorflow benchmark:[url]https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks[/url], but the result is very different with above. My result is below, only no use fp16, change cudnn7.0 to 7.2 can be 399 -->>433 images / sec.
But why ? Who can tell me? THx!
cmd:
CUDA_VISIBLE_DEVICES=‘7’ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server --use_fp16
My result:
cudnn 7.0:
total images/sec: 630.29
cudnn 7.2(modify CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION in tensorflow source code):
total images/sec: 643.68
cmd:
CUDA_VISIBLE_DEVICES=‘7’ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
My result:
cudnn 7.0:
total images/sec: 339.56
cudnn 7.2(modify CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION in tensorflow source code):
total images/sec: 433.82