the maximum number of blocks and threads

qijiin21c · September 3, 2008, 2:08am

What is the maximum number of blocks and threads in a grid. While using GPU caculating the product of two big matrice, I found the result is different between the GPU and CPU. So I guess the number of threads is restricted.

S.Warris · September 3, 2008, 3:31am

Yes, these numbers are restricted. Please check the Programming Guide for details.

qijiin21c · September 3, 2008, 1:47pm

But I cannot find it in “Programming_Guide_2.0beta2”. Can you tell me the numbers? My display card is 9600GT, and CUDA2.0.

Tigga · September 3, 2008, 1:50pm

Appendix A.1.1:

"The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512,

and 64, respectively"

“The maximum size of each dimension of a grid of thread blocks is 65535”

Fuchs · September 3, 2008, 2:29pm

Addition note: Start the CUDA SDK example “DeviceQuery”.

E.D_Riedijk · September 3, 2008, 7:26pm

with a maximum of 512 threads per block, and only 2 dimensions in the grid.

qijiin21c · September 4, 2008, 2:00am

Thanks, everybody!
But what I really wanna know is the maximum number of threads per grid.

S.Warris · September 4, 2008, 3:19am

512 * (65535^2) = 2 198 956 147 200

qijiin21c · September 4, 2008, 3:35am

The biggest size of the matrix I calculated is 4720*4720=22 278 400, which is far smaller than the number 2 198 956 147 200 and 512 * 65535=33 553 920.

cbuchner1 · September 4, 2008, 2:55pm

would be interesting to launch a grid of that size with an empty kernel to see how much overhead this incurs.

_Big_Mac · September 4, 2008, 3:08pm

I’ve just benchmarked launching the biggest empty kernel (65535^2 blocks, 512 threads) and it takes actually less than launching a smaller kernel (256 blocks). WTH.

10’000 launches, threadsync after each:
big kernel - 209ms (21 microseconds per kernel)
small kernel - 300ms (30 microseconds per kernel)

Also, apparently I can launch empty kernels bigger than 65535x65535. I must be doing something wrong?

EDIT: Duh! I misplaced gridDim and blockDim in launch parameters.

Now calling a maxed out kernel results in either a timeout error or a bluescreen. So the overhead is surely larger than about 10s. Perhaps a linux machine with no watchdog timer could run such a kernel.

Topic		Replies	Views
MAximum block per grid CUDA Programming and Performance	8	5879	April 18, 2011
Maximum possible number of threads (Total) CUDA Programming and Performance	1	1009	December 28, 2009
Question regarding maximum amount of blocks CUDA Programming and Performance	2	796	January 28, 2011
Maximum block per grid CUDA Programming and Performance cuda	4	3561	March 24, 2023
Questions about Block and Grid CUDA Programming and Performance	4	3548	February 26, 2008
Thread Number Limitation CUDA Programming and Performance	3	3890	December 22, 2008
Question about grid/block/thread sizes CUDA Programming and Performance	3	12282	November 13, 2012
How Many Maximum threads can be generated on GPU device at once CUDA Programming and Performance	3	2039	June 4, 2009
how many threads can used in one grid 5126553565535 CUDA Programming and Performance	1	1664	June 24, 2009
Newbie Question: Device Capabilities CUDA Programming and Performance	2	3058	July 19, 2008

the maximum number of blocks and threads

Related topics