Meaning of Operation Throughput

Nikratio · February 17, 2011, 1:45am

Hello,

The CUDA Guide says that a 2.0 GPU has a throughput of 32 float multiplications per clock cycle per multiprocessor, but that a warp will have to wait 22 clock cycles for the result of such a computation. To me this means that in each multiprocessor there must be 32 pipelines which each hold 22 multiplications in various stages of completeness. However, 22 steps for a multiplication seems an awful lot to me, but I don’t really see any other way to get to these numbers… Am I missing something?

Best,
Nikolaus

tera · February 17, 2011, 2:33pm

No, you are seeing it right. The only thing is that unlike modern CPUs, GPUs are not optimized for low latencies but for throughput. (Moderate) latency is easily hidden by having loads of warps (threads) waiting to be scheduled onto the multiprocessor.

Also, the 22 cycles include instruction decode etc., unlike the 4 cycles that common out of order CPUs achieve.

Nikratio · February 17, 2011, 3:22pm

I see, thanks!

Topic		Replies	Views
Confusion about performance guide information CUDA Programming and Performance	7	6672	July 23, 2009
Basic question about warps CUDA Programming and Performance	14	6590	June 9, 2009
Execution of warps CUDA Programming and Performance	1	1552	January 7, 2009
How is WARP SIZE determined? CUDA Programming and Performance	3	3225	July 16, 2010
Questin regarding latency CUDA Programming and Performance	6	4246	August 26, 2010
Parallel Access to GDU Global Memory CUDA Programming and Performance	9	8935	January 24, 2008
Any need to revise the principle "Threads in a half-warp are SIMT synchronous" ? CUDA Programming and Performance	1	693	July 30, 2013
question about warp, block and threads CUDA Programming and Performance	4	2002	February 3, 2009
# of multiprocessors still more silly stuff to ask CUDA Programming and Performance	5	16346	February 24, 2007
Significance of Multiprocessor Cores CUDA Programming and Performance	2	7680	February 17, 2011

Meaning of Operation Throughput

Related topics