device organization


Im not sure if I full get the organization of a CUDA device yet and I’m trying understand the output of deviceQuery.

Does this mean I can have 65535 blocks total on one big grid with 512 threads running in each block? Does this also mean that at any point I actually have 32 threads running at once?


Well, you can run up to 65535 * 65535 blocks in a single call, which is a lot. And 32 is just the warp size. G80 has 16 multiprocessors each of which can keep 24 warps running in an interleaved fashion => the device is running 162632 threads at once.