MAximum block per grid


Is there any limit on number of blocks per kernel launch (i mean per grid)?
if yes then what is its size?


Yes there is a limit - Appendix F of the current programming guide lists what it is for the hardware you are using.


is it - Maximum x- or y-dimension of a grid of thread blocks?

which is 65535?

The limit is 65535 in each dimension of a grid your hardware and CUDA version support. Pre-Fermi and pre-CUDA 4.0, grids were 2D only. On Fermi in CUDA 4.0, they can be 3D.

but why is it 65535 and not 65536 or some other multiple of 16 or 32 ?

Ask an Nvidia hardware engineer, but quite likely it is because there are only 16 bits to store it.

The size of the grid is not a serious limitation in any way. if 65535×65535 is not enough, you can add a loop to your kernel code, or use multiple kernel launches.

ok, now when i am trying to launch a 65535*65 (or 4259775 no of blocks ? ) my screen goes black for while and system stops responding.

any idea what is actually going on?

am i forcing too many blocks at a single kernel launch?

My kernel solves just a one line equation.

If your GUI is running on the same GPU as your kernel, the screen will not be updated while the kernel executes. There will also be a kernel runtime limit of about 5s to make sure you regain control over your system after a while.
Either spread the work over several kernel invocations to give the GUI a chance to update the screen in between, or buy a second GPU so that CUDA and the GUI can use independent GPUs.

4 259 775 = how many memory you use ?
i thinks the problem is memory
when i try to use more than 480Mo/512Mo i crash