Maximum of threads On 8600GT

GSRush · April 6, 2008, 3:16pm

I wish to start a maximum of threads on a card…
As I understand 4 coprocessors,
and with blocks I can not understand
Prompt
or where to look

DenisR · April 6, 2008, 7:32pm

programming guide has it all

MisterAnderson42 · April 6, 2008, 8:42pm

Well, the maximum grid size is 65535x65535 and the maximum number of threads per block is 512, so you can execute approximately 2e12 threads. Definitely only try this on the linux console (no X) or with a 2nd display in windows, since it is bound to take a very long time even to launch an empty kernel.

Anyways, as DenisR said: the programming guide has it all (and is very well written). Especially relevant is where it says launching 100’s of blocks is needed to reach optimal performance.

GSRush · April 8, 2008, 4:56am

Well it cleanly theoretically…

But in occupancy calculator I see this:
for G84
Active Threads per Multiprocessor
512
Active Thread Blocks per Multiprocessor
1
Multiprocessors per GPU
4

Unless it will not turn out here so:

threads = multiprocessors per GPU * Active thread blocks * Active threads per Multiprocessor

threads = 4 * 1 * 512;
threads = 16384

Or I am not right?

jdigittl · April 8, 2008, 5:08am

The GPU can have more than just active threads loaded at the same time. The GPU schedules time between active threads and those that are waiting for execution, and context switches are very fast. So while only a certain number are actually running at one time, many many more can be waiting to run. Generally you want to balance between the time that it takes to read from memory and the time that instructions take to process on the GPU so that while one set of threads (a warp) is waiting to read or write to RAM, another set are running on data that has been loaded.

Take a quick peek at section 3.2 of the programming guide.

MisterAnderson42 · April 8, 2008, 12:26pm

Umm, 4 * 512 = 2048. But 512 thread blocks is not the way to get the most number of threads actively running on the device. Maximum occupancy is 24 warps per multiproc => threads = 24 * 32 * multiprocessors

In your case with multiprocessors = 4, threads = 3 072

But running exactly this number of threads (or the number matching the occupancy of your kernel) isn’t going to be the most efficient. Consider it a lower bound. To really get into the linear performance region of the card, you need 2-3 times this number of threads. The GPU is built to swap new thread block in instantly when one completes. So if you only fill your GPU with exactly 3072 threads, any blocks that finish faster than the others will result in “wasted” GPU time as a multiproc sits partially idle.

As I said though: this calculation is still useful as a lower bound to get decent efficiency.

GSRush · April 9, 2008, 12:20am

Oops :D

Oh, I there that have forgotten to write down that in the formula

Topic		Replies	Views
threads how many threads can simultaneously execute? CUDA Programming and Performance	1	1967	February 27, 2009
Max no. of threads in a multiprocessor. CUDA Programming and Performance	4	1693	September 29, 2009
finding the best number of threads per block CUDA Programming and Performance	3	7849	January 29, 2010
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	6368	December 29, 2022
Architecture Questions CUDA Programming and Performance	6	8171	February 12, 2008
Maximum Number of Threads CUDA Programming and Performance	5	2398	June 4, 2010
maximum threads per block not always used CUDA Programming and Performance	2	754	June 14, 2018
How to use blocks CUDA Programming and Performance	1	3568	November 26, 2007
how to determine max number of blocks per kernel CUDA Programming and Performance	10	17220	September 11, 2011
Limit to Number of Blocks? Noob Question CUDA Programming and Performance	4	2987	May 16, 2008

Maximum of threads On 8600GT

Related topics