# of multiprocessors still more silly stuff to ask

Knaxkopp · February 19, 2007, 2:28pm

Always thought G80 is 8(6) 16x SIMD with 1.35 (1.2) GHz.

CUDA programming guide says:

So is there 816kb or 1616kb shared ram?

Greetings

     Knax

Mark_Harris · February 19, 2007, 3:51pm

On 8800 GTX there are 16 16kb shared memory regions, one per multiprocessor.

On 8800 GTS there are 12.

Mark

Wouter_Wiggers · February 22, 2007, 10:57am

On page 49 of the programming guide it says that there are 2 clockcycles needed to process 32 threads on the 8 processors per multiprocessor. This makes sense if the clockfrequency is twice as high internally.

However on page 51, it is said that 2 clockcycles are needed for a floating point operation. Is this at 650 Mhz or the double frequency ? For 650 Mhz I can’t understand how the observed 350 FLOPS on page 1 are calculated.

Mark_Harris · February 22, 2007, 1:30pm

It takes 2 clock cycles for a float operation per warp of 32 threads. All clock cycles referred to in the programming guide refer to the 675 MHz instruction clock (on GeForce 8800 GTX). Yes, the 'actual" clock in the hardware is 2X that rate (as advertised) and instructions are multi-pumped.

Because instruction decode operates on 32-thread warps, it made more sense for us to talk about things in terms of the 675MHz clock and 2 cycles per warp rather than the 1350MHz clock and 4 cycles per warp.

16 Multiprocessors * 8 processors / multi * 2 flops / MAD * 1 MAD / processor-cycle * 1.35 GHz = 345.6 GFLOP/s

Alternatively:

16 Multiprocessors * 8 processors / multi * 2 flops / MAD * 2 MAD / processor-cycle * .675 GHz = 345.6 GFLOP/s

Mark

Wouter_Wiggers · February 22, 2007, 2:03pm

With multipumped you mean pipelined ? Thus the latency of a MAD is 4 processor cycles and the throughput is 1 processor cycle (@ 1.35 Ghz) ?

This is not easily understood from the programming guide.

Mark_Harris · February 24, 2007, 10:04pm

Perhaps multipumped was the wrong word. I just meant that each warp takes multiple cycles to process.

Mark

Topic		Replies	Views
shared memory question about shared memory CUDA Programming and Performance	2	3627	October 29, 2007
Missing some GFlops CUDA Programming and Performance	3	2285	December 4, 2007
G80 CUDA Programming and Performance	5	4720	July 31, 2008
CUDA on G80 hardware questions... Mapping the execution model to hardware CUDA Programming and Performance	10	12479	April 10, 2008
8800GTX:345GFlops or 518GFlops? CUDA Programming and Performance	8	9655	December 12, 2007
GPU's memory CUDA Programming and Performance	8	11225	October 11, 2007
Shared Memory Confusion CUDA Programming and Performance	6	3835	June 16, 2008
So how much shared mem do we really have ? knowing cuda hw better = better optimization CUDA Programming and Performance	0	1938	November 20, 2009
How to compute performance in GFLOPS ? CUDA Programming and Performance	25	12244	November 17, 2008
spec for C870? cannot found detail on nvidia website CUDA Programming and Performance	1	3953	November 24, 2007

# of multiprocessors still more silly stuff to ask

Related topics