The execution model is as such: there is a grid of blocks of thread warps of threads.
Dimension of Grid = number of blocks
Dimention of Block = number of threads
32 threads = 1 warp
kernel<<<100,320>>>(…); will thus schedule 1 grid of 100 blocks, each having 320 threads (10 warps).
Note that each of these X,Y,Z values in ‘X x Y x Z’ are maximums and the overall X,Y,Z configuration is subject to different constraints which are dependent on your particular GPU architecture number (1.0-1.3) [aka. “Compute Capability”]. From memory, two particularly relevant sections of the Programming Guide are: 5.2 and Appendix A.
Again, this is all explained in the programming guide: