How much threads can execute in parallel?

explode · April 24, 2009, 1:34pm

Hello everybody.

I’m trying to learn a little programming with CUDA and GF9800 GT. I’m a little confused about parallel execution on the device. In the programming guide is written that the 9800GT has 16 multiprocessors and every mulitproc. has 8 processors. So the device can execute 16*8=128 threads in parallel. Is that correct? What thus a wrap mean? Wrap has 32 threads per multiprocesor, but the multiprocessor has only 8 processors so mulstiproc. can only execute 8 threads in parallel. What happens with the reast of threads 32-8=24?

I’m probably very wrong in my conclusions, so please correct me!

Thank you for your time.

I will have more questions later.

theMarix · April 24, 2009, 1:37pm

The programming guide states that a multiprocessor processes one warp in 4 cycles. This allows the shaders to be clocked higher than the instruction decoder.

_Big_Mac · April 24, 2009, 2:58pm

You can think of it as the remaining 24 threads of each warp being in a pipeline. That means that all 32 get processed at the same time but only 8 of them are being finished every shader clock cycle.

This is not exactly true, that pipeline thing, but it’s a proper abstraction. Nvidia doesn’t really elaborate much on how this works deep in the hardware.

explode · April 24, 2009, 4:48pm

Thank you for your reply. So 9800 gt can process 512 threads in 4 clock cycles. So for max speed it’s better to not use more than 512 threads or?

Thanks.

_Big_Mac · April 24, 2009, 5:54pm

No, you will want to use tens of thousands of threads for best performance. There’s virtually no penalty for having “too much” and more threads means better saturation of all the queues and pipelines. In my current app, I have a million and that’s not considered very much either.

512 threads is way too few.
Read the Programming Guide if you haven’t already, it was answered there.

Topic		Replies	Views
number of threads on device at given time CUDA Programming and Performance	2	1236	September 12, 2009
why in groups of 32 parallel threads ? CUDA Programming and Performance	2	1378	February 25, 2009
number of simultaneous threads CUDA Programming and Performance	7	3499	February 26, 2010
threads how many threads can simultaneously execute? CUDA Programming and Performance	1	1996	February 27, 2009
Architecture Questions CUDA Programming and Performance	6	8199	February 12, 2008
A question about the CUDA's thread parallelization CUDA Programming and Performance	12	63051	January 25, 2009
How many concurrently running threads CUDA Programming and Performance	1	2998	July 1, 2007
A question the parallelization CUDA Programming and Performance	1	1196	July 28, 2008
Yet another "How many threads run concurrently ?" question. CUDA Programming and Performance	10	14915	December 14, 2010
How to use blocks CUDA Programming and Performance	1	3586	November 26, 2007

How much threads can execute in parallel?

Related topics