P100 is much slower than P4???

fengyuann · August 15, 2018, 9:13am

I tested the convolution operation on P100 and P4 with tensorflow 1.8 as follows:

x = tf.Variable(tf.random_normal((64, 512, 55, 55), dtype=tf.float32))
f = tf.Variable(tf.random_normal((3, 3, 512, 512), dtype=tf.float32))
conv_op = tf.nn.conv2d(x, f, [1, 1, 1, 1], 'SAME', data_format='NCHW')

The timeline.json is generated on both P100 and P4, which shows:

XX	                   P4	                 P100
occurrences	            5	                  5
Wall duration	        77.903ms	       122.510ms
Average Wall Duration	15.561ms	        23.502ms

How can the convolution consume much more time on P100 than on P4? Since the pronounced teraFlops are:

XX    Double precision	    Single precision	      Half precision
P4	                     5.5 teraFLOPS	         22 teraFLOPS
P100	4.7 teraFLOPS	     9.3 teraFLOPS	        18.7 teraFLOPS

Topic		Replies	Views
inference time of tensorrt is slower than tensorflow !!! TensorRT	2	1435	September 27, 2019
Slow tensorflow-gpu execution on A100 Frameworks tensorflow	0	1530	May 19, 2021
pb:tensorflow-gpu with cuda 7.5 and cudnn 4 is faster then tensorflow-gpu cuda8 and cudnn 6 Jetson TX2	2	1179	October 18, 2021
the latency of int8 mode in T4 is very slow TensorRT	3	929	October 1, 2019
Slow tensorflow-gpu execution on A100 GPU TensorRT	1	1122	May 19, 2021
TensorRT inference Time TensorRT	1	759	September 20, 2018
Convolution performance question on Volta in a windows environment cuDNN	1	618	December 19, 2019
Low performance for convolution in cuDNN on Tesla V100 cuDNN	5	2076	August 2, 2018
Depthwise convolution in cudnn fp16 is slow than fp32 Jetson AGX Xavier cudnn	6	1332	October 18, 2021
CUFFT taking longer on some data than others CUDA Programming and Performance	0	2738	May 19, 2011

P100 is much slower than P4???

Related topics