Confusion about GFlops of c1060/c2050

Hello guys,

I found there are two peak GFlops published here Nvidia Tesla - Wikipedia. Take c2050 for example. One GFlops is 1288 and the other is 1030. What’s the difference exactly? I personally think 1030 GFlops is something like raw rate. 1288 is converted from it. Any idea?

Thanks,
Hardy

Hello guys,

I found there are two peak GFlops published here Nvidia Tesla - Wikipedia. Take c2050 for example. One GFlops is 1288 and the other is 1030. What’s the difference exactly? I personally think 1030 GFlops is something like raw rate. 1288 is converted from it. Any idea?

Thanks,
Hardy

The GPUs have multiple functional units that can be active at the same time. If you look at the column titles, you see the higher one says “MUL+ADD+SF”, by which I believe they mean the Multiply-Add instruction dual-issued with a special function instruction (__expf(), __cosf(), etc). The second column says “MUL+ADD”, so the special function contribution has been removed.

Of course, reaching either of these peak GFLOPS requires that you have nothing but MAD or MAD and special function instructions available for execution, with no other bottlenecks (like waiting on global memory reads).

The GPUs have multiple functional units that can be active at the same time. If you look at the column titles, you see the higher one says “MUL+ADD+SF”, by which I believe they mean the Multiply-Add instruction dual-issued with a special function instruction (__expf(), __cosf(), etc). The second column says “MUL+ADD”, so the special function contribution has been removed.

Of course, reaching either of these peak GFLOPS requires that you have nothing but MAD or MAD and special function instructions available for execution, with no other bottlenecks (like waiting on global memory reads).

That makes sense. Thanks a lot!