Cudnn may be slower?

Problem about caffe with cudnn performance.
Caffe with cudnn is 1.6x faster than that without cudnn, when batch-size was 64. However, when I set batch-size to 1, caffe with cudnn is 2x slower. Dose cudnn have heavy overhead?

Any help would be much appreciated!

Do you use convolution layers? If yes, your findings are in alignment with existing literature.

Search for the paper “cuDNN: Efficient Primitives for Deep Learning” (Chetlur, Sharan et. al.)

In that paper, figure 2 gives you a rough idea about the performance of cuDNN convolutions vs. batch size. (blue line in the graph = cuDNN, red line = CAFFE)



Hi t3l,
Thanks a lot!
Yes, I use convolution layers.
So the cudnn is not suitable for 1 image prediction based on caffe.
The paper did not explain the reason. I guess it was because Cudnn need more cudaMemcpy for its input data preparation.
I think there should be a dynamic switch so as not to invoke cudnn when the coming batch-size is too small.


Upcoming cuDnn v4 has improved performance for batchSize=1 on Maxwell Architecture. Stay tuned.