Recently I’m trying to benchmark the new Titan V card and compare with Titan Xp. My initial expectation is that Titan V will always be faster than Titan Xp in most circumstances. But to my surprise, I found that Titan V is even slower than Xp when batch size == 1.
-
Driver Version: 384.111
-
CUDA Version: CUDA9.0 + CUDNN7.1.4
-
Framework: Tensorflow 1.10 and NVCaffe 0.17.
-
Test Method: <tensorflow/benchmarks> and
-
Test Results on Tensorflow (Units is ms/img, smaller is better)
batch_size = 16 VGG16 Inception V3 ResNet50 Titan V 6.0 7.0 4.7 Titan Xp 7.5 8.3 5.5 batch_size = 1 VGG16 Inception V3 ResNet50 Titan V 26.3 43.5 32.3 Titan Xp 30.3 38.5 28.6 -
Test Results on NVcaffe
batch size = 16 VGG16 Inception V3 ResNet50 Titan V 30.4 47.1 35.8 Titan Xp 33.9 53 41.2 batch size = 1 VGG16 Inception V3 ResNet50 Titan V 6.53 24.8 14.1 Titan Xp 5.45 22.6 12.1 -
As we can see, when batchsize = 16, Titan V is constantly faster than Titan Xp by a fraction of 10-15%, however, when batchsize = 1, Titan Xp is even faster than Titan V under Inception V3 and ResNet50.
-
How could Titan V perform worse than Titan Xp? Is it because my benchmark method is problematic or something else? I’ve googled most of the Titan V performance review articles, and they all report latency on a large batch size, so I have no idea whether this is expected behavior?
Thanks.