Titan V is slower than Titan Xp when batch size is small ???

mtyylx · August 13, 2018, 12:58pm

Recently I’m trying to benchmark the new Titan V card and compare with Titan Xp. My initial expectation is that Titan V will always be faster than Titan Xp in most circumstances. But to my surprise, I found that Titan V is even slower than Xp when batch size == 1.

Driver Version: 384.111
CUDA Version: CUDA9.0 + CUDNN7.1.4
Framework: Tensorflow 1.10 and NVCaffe 0.17.
Test Method: <tensorflow/benchmarks> and
Test Results on Tensorflow (Units is ms/img, smaller is better)

batch_size = 16 VGG16 Inception V3 ResNet50

Titan V 6.0 7.0 4.7

Titan Xp 7.5 8.3 5.5

batch_size = 1 VGG16 Inception V3 ResNet50

Titan V 26.3 43.5 32.3

Titan Xp 30.3 38.5 28.6
Test Results on NVcaffe

batch size = 16 VGG16 Inception V3 ResNet50

Titan V 30.4 47.1 35.8

Titan Xp 33.9 53 41.2

batch size = 1 VGG16 Inception V3 ResNet50

Titan V 6.53 24.8 14.1

Titan Xp 5.45 22.6 12.1
As we can see, when batchsize = 16, Titan V is constantly faster than Titan Xp by a fraction of 10-15%, however, when batchsize = 1, Titan Xp is even faster than Titan V under Inception V3 and ResNet50.
How could Titan V perform worse than Titan Xp? Is it because my benchmark method is problematic or something else? I’ve googled most of the Titan V performance review articles, and they all report latency on a large batch size, so I have no idea whether this is expected behavior?

Thanks.

batch_size = 16	VGG16	Inception V3	ResNet50
Titan V	6.0	7.0	4.7
Titan Xp	7.5	8.3	5.5

batch_size = 1	VGG16	Inception V3	ResNet50
Titan V	26.3	43.5	32.3
Titan Xp	30.3	38.5	28.6

batch size = 16	VGG16	Inception V3	ResNet50
Titan V	30.4	47.1	35.8
Titan Xp	33.9	53	41.2

batch size = 1	VGG16	Inception V3	ResNet50
Titan V	6.53	24.8	14.1
Titan Xp	5.45	22.6	12.1