NVIDIA Quadro performance

edit_or · October 20, 2015, 8:25pm

I find that this is the most suitable forum for my discussion. I run caffe library[http://caffe.berkeleyvision.org/] with cuDNN. My previous graphic card is NVIDIA Quadro K2000 with 384 GPU cores. Now I changed to NVIDIA Quadro K4200 with 1344 GPU cores. But when I run the caffe library, Quadro K4200 doesn’t show significant processing speed improvement. I wonder why I expected 3-4 time processing speed improvement.
Thanks

njuffa · October 20, 2015, 8:50pm

What kind of task are you benchmarking? Have your profiled the code? Is the task compute bound or memory bound? What speedup are you seeing, exactly? Are both cards running in the same machine, just swapping GPUs? Is it possible this code is limited by host performance, or copies between the host and the device?

Expectations of about about 3x speedup seem reasonable under the assumption that the code spends almost all the time doing work on the GPU, based on the published specifications that indicate that both compute throughput and memory bandwidth of the K4200 are about triple that of the K2000.

edit_or · October 20, 2015, 8:57pm

yes same machine, just swap the graphic card. I run images, K2000 takes about 200msec, K4200 also takes 200msec. Yeah from this discussion, I realized the processing has GPU and CPU processing. GPU processing may be improved, but it is bottlenecked at CPU processing.
Thanks, let me check.

edit_or · October 20, 2015, 9:22pm

How can I check the code, the code has compute bound or memory bound? Thanks

Robert_Crovella · October 20, 2015, 9:56pm

What processing in Caffe are you doing, exactly?

In general, neural network training in Caffe should be GPU accelerated (assuming you have built Caffe properly to use the GPU) but other tasks may not be.

I don’t know of any real-world training task that can be done in 200msec. It sounds to me like you are doing image classification (i.e. inference) not training. I’m not sure image classification in Caffe is GPU accelerated (although it may be). Most of the time when people are talking about GPU acceleration in Caffe, they are talking about for a long training run that usually takes minutes, hours, days, or weeks.

Topic		Replies	Views
Slow performance with one Nvidia RTX 4090 board and 32 processors. Code written in C/C++ CUDA Programming and Performance cuda	2	85	August 2, 2024
Peformance comparison ends in strange results CUDA Programming and Performance	3	758	August 9, 2019
Disappointed performance using C2050 CUDA Programming and Performance	20	7753	September 2, 2010
Drive PX 2 Caffe performance General	2	1338	March 5, 2018
Quadro 2000 vs. GTX480 for CUDA? CUDA Programming and Performance	3	2549	October 5, 2010
Early comparison of Tesla K20c vs. Tesla K40x CUDA Programming and Performance	13	4668	January 9, 2014
C2070 VS. K20 CUDA Programming and Performance	5	1241	November 6, 2013
Does NVidia know about the 300% perf improvement cuDNN can provide? CUDA Programming and Performance cuda	6	4030	November 4, 2023
Cuda 4.0 decreases speed? CUDA Programming and Performance	3	684	October 11, 2011
CUDA slower than CPU? CUDA Programming and Performance	7	878	August 18, 2023

NVIDIA Quadro performance

Related topics