I’m a total newbie here so this should be easy. Here’s part of my deviceQuery:

Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

My questions are:

What is a Warp?

For the dimensions of the grid, is this the number of threads or number of blocks?

warp = group of 32 threads that are executed simultaniously in a multiprocessor
have to read cuda programming guide

The execution model is as such: there is a grid of blocks of thread warps of threads.


  • Dimension of Grid = number of blocks

  • Dimention of Block = number of threads

  • 32 threads = 1 warp

kernel<<<100,320>>>(…); will thus schedule 1 grid of 100 blocks, each having 320 threads (10 warps).

Note that each of these X,Y,Z values in ‘X x Y x Z’ are maximums and the overall X,Y,Z configuration is subject to different constraints which are dependent on your particular GPU architecture number (1.0-1.3) [aka. “Compute Capability”]. From memory, two particularly relevant sections of the Programming Guide are: 5.2 and Appendix A.

Again, this is all explained in the programming guide:

CUDA Programming Guide 2.3