multiprocessors

e.ping · May 21, 2007, 6:14pm

Hi,

I am curious. Since the Geforce 8800 GS has 12 multiprocessors with 8 processors each, does that mean that if I have 8 threads and 12 blocks that I am getting the maximum amount of parallelism possible? Is there a benefit speedwise of specifying more threads? Is there a performance hit?

Thanks.

-DevMike

thegallier · May 21, 2007, 9:06pm

Its a bit trickier than that (function of memory etc.). On the number of blocks you are on the right track. It is most likely a multiple of 12 (for the gts series). The number of threads per block is most likely a multiple of 32. The calculator in the thread below should be helpful.

[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

seibert · May 22, 2007, 12:28am

As was mentioned, you want a multiple of 32 threads per block because that is the number that are run at once (a “warp”) due to pipelining and other scheduling issues. If possible, you also want more than 12 blocks, because more than one block (from the same kernel) can be interleaved on a given multiprocessor, and this can hide some of the latency to global memory. The number of blocks you can run simultaneously will be limited by your shared memory requirements, and the number of threads per block will be limited by the number of registers each thread needs.

paulius · May 22, 2007, 3:17pm

Yes, there is a benefit in using more threads per threadblock:

you need at least 16 threads to get coalescing when accessing global memory (coalescing significantly improves performance).
having more threads helps “hide” global memory access latency (which for g80 is between 400 and 600 clocks). Think of it this way: if you have multiple threads per processor issue memory access instructions (say, reads), you incur the latency once, after which a new read will complete with each clock. Same idea as pipelining.
having more threads helps hide read-after-write register conflicts. According to the programming guide, you need 192 threads to completely avoid performance hit due to these conflicts (if your code creates them).

Paulius

Topic		Replies	Views
Architecture Questions CUDA Programming and Performance	6	8170	February 12, 2008
thread vs block CUDA Programming and Performance	1	1372	July 9, 2009
Is this a good match for GPU? CUDA Programming and Performance	5	3613	June 11, 2009
How many concurrently running threads CUDA Programming and Performance	1	2973	July 1, 2007
How to use blocks CUDA Programming and Performance	1	3568	November 26, 2007
Maximum Number of Threads CUDA Programming and Performance	5	2398	June 4, 2010
Number of thread blocks and threads in those, difference for performance? CUDA Programming and Performance	1	381	September 6, 2021
Performance in different thread-block schemes CUDA Programming and Performance	5	2348	September 19, 2008
finding the best number of threads per block CUDA Programming and Performance	3	7846	January 29, 2010
2 blocks versus 3 blocks CUDA Programming and Performance	5	4917	August 3, 2009

multiprocessors

Related topics