NVIDIA Quadro performance

I find that this is the most suitable forum for my discussion. I run caffe library[http://caffe.berkeleyvision.org/] with cuDNN. My previous graphic card is NVIDIA Quadro K2000 with 384 GPU cores. Now I changed to NVIDIA Quadro K4200 with 1344 GPU cores. But when I run the caffe library, Quadro K4200 doesn’t show significant processing speed improvement. I wonder why I expected 3-4 time processing speed improvement.

What kind of task are you benchmarking? Have your profiled the code? Is the task compute bound or memory bound? What speedup are you seeing, exactly? Are both cards running in the same machine, just swapping GPUs? Is it possible this code is limited by host performance, or copies between the host and the device?

Expectations of about about 3x speedup seem reasonable under the assumption that the code spends almost all the time doing work on the GPU, based on the published specifications that indicate that both compute throughput and memory bandwidth of the K4200 are about triple that of the K2000.

yes same machine, just swap the graphic card. I run images, K2000 takes about 200msec, K4200 also takes 200msec. Yeah from this discussion, I realized the processing has GPU and CPU processing. GPU processing may be improved, but it is bottlenecked at CPU processing.
Thanks, let me check.

How can I check the code, the code has compute bound or memory bound? Thanks

What processing in Caffe are you doing, exactly?

In general, neural network training in Caffe should be GPU accelerated (assuming you have built Caffe properly to use the GPU) but other tasks may not be.

I don’t know of any real-world training task that can be done in 200msec. It sounds to me like you are doing image classification (i.e. inference) not training. I’m not sure image classification in Caffe is GPU accelerated (although it may be). Most of the time when people are talking about GPU acceleration in Caffe, they are talking about for a long training run that usually takes minutes, hours, days, or weeks.