No.of threads per scalar processor

Hi,

I understand that at any point of time one Scalar Processor can handle maximum of 4 threads. This is because the warp is 32 threads and we have 8 scalar processors in one streaming Multiprocessor. So 32/8= 4 threads per scalar processor. Can anybody verify my undestanding.

Thanks

In my understanding each warp consists of 4 cores, each of which supports 8 threads in its 8 stream processors. Each stream processor only supports a single thread, but of course to hide memory latency it is always a good idea to load it with a few hundred threads.

The execution of a single block (and hence also the warps) can never span multiple SMs, each block is executed by a single SM.

IMHO it just takes 4 cycles to execute a warp on a single SM, and these are executed in sequence.

N.

Stream Multiprocessor as the name suggest is a MULTI processor. It consists of one instruction processor, 8 scalar processors, shared memory and some other processors for God knows what :)
Each Scalar Processor handles 4 threads at a time. My guess is that while first thread is being processed, it can already take care of the second, third and fourth one before results from the first one are obtained. However some instructions require much more time (e.g. memory access). When that happens, those SP would be idle. To hide that idling, those 8 scalar processors are assigned to another group of 32 threads (called warps).

pDan: In Chapter-3 by David Kirk and Wen-mei Hwu (CUDA Threads) it mentions the following on page number 9:

To summarize, for the GeForce-8 series processors, there can be up to 24 warps residing ineach Streaming Multiprocessor at any point in time. We should also point out that the SMs are designed such that only one of these warps will be actually executed by the hardware at any point in time. A

This clearly indicates that simultaneously 4 threads are executed by each SM, since each SM has only 8 Scalar processors. Doesn’t that means we have 4 processing units that is ALUs in each scalar processor?

Warps and Threads are not physical entities that reside in a SMs. It is logical grouping based on how threads are handled by SM.

A single warp consists of 32 threads handled by 8 scalar processors (one SP – not whole SM – handles 4 threads at a time). For GF80 series a SM may have 768 active threads, that makes 24 warps. GTX200 series can have 1024 active threads, that makes 32 warps.

I have no idea how did you derive “This clearly indicates that simultaneously 4 threads are executed by each SM, since each SM has only 8 Scalar processors.” from the quoted sentence?

Yes PDan, You are right. Actually in the sentence “This clearly indicates that simultaneously 4 threads are executed by each SM, since each SM has only 8 Scalar processors.”, the second "SM’ should have been SP (scalar processor).

Thanks

H