I encountered a question when I try to compare the performance of my code against those in a published paper. We use different GPUs, and I am wondering if there is any reliable way to compare them side by side.
In the paper, the program is running on GeForce 8800 GTX, which has 16 multiprocessors and a GPU clock of 570MHz (according to wikipedia).
My workstation has a Quadro FX 4600 GPU, which has 12 multiprocessors. I am not exactly sure about the frequency of its GPU clock. Wiki says it’s 400MHz, but some other websites (Wize Commerce) says it’s 675Mhz.
My first question is, may I just translate my speed x 16/12 to get an estimate of the speed on GeForce 8800 GTX? Has any one done this before and is that reliable?
Also, is there any way to figure out the frequency of the GPU clock, esp. Quadro FX 4600?
There are two clocks of interest for CUDA: the “shader” clock and the memory clock. The shader clock (times the # of SPs or “CUDA cores” or whatever people call them these days) sets the floating point performance. The memory clock (times the width of the bus * 2 for DDR) sets the memory bandwidth. The clocks you are quoting are the “core clocks” and don’t tell you anything about performance. For the 8800 GTX, these clocks are 1.35 GHz (shader) and 0.9 GHz (memory, sometimes quoted as 1.8 GHz because they include the DDR factor of 2).
The reason I mention both clocks is that CUDA programs can be compute bound, or memory bandwidth bound, or somewhere in between. More programs are memory bandwidth bound than you might expect. So to compare two cards, you should look at the ratio of shader clock * SPs and also the ratio of memory clock * memory bus width. Generally, the performance difference will be somewhere between those two ratios.
This scaling can be misleading if you compare GPUs with very different capabilities (like 8800 GTX to GTX 480), but in your case the 8800 GTX and the Quadro 4600 use the same generation of GPU, so the scaling argument should be pretty good.