Reeeealll basic question here. Hope I don’t sound lazy or stupid. Anyway: Could someone enlighten me as to the meaning of some of the output of pgacellinfo? I am particularly wondering about “Maximum Block Dimensions” and “Maximum Grid Dimensions.” I have a rudimentary understanding of threads and thread blocks, but I’m not sure what pgacellinfo is telling me here. For instance, is “Maximum Block Dimensions” the maximum number of blocks you can have in the three “directions” or is it the maximum number of threads per block in each “direction” or …? I don’t think I have a clue as to what “Maximum Grid Dimensions” is. Thanks.

Each CUDA enabled device has a limit on the number of threads that can included in a Block as well as how they are indexed. This thread indexing is the “Block Dimension”. So on my laptop’s Quadro FX 880M, I can have up to 512 threads spread across three dimensions. The Max Block Dimension is 512,512,64. So I if I used all 512 threads, I could index them as a 512x1x1 block, or a 1x512x1 block, or a 4x4x32 block, etc. So the “Maximum Block Dimensions” is the maximum number a threads a particular dimension may index provided the product of the three dimensions does not exceed the maximum number of threads.

Similarly, the “Maximum Grid Dimensions” is the maximum number of blocks a grid may index. For my 880M, the grid dimension is 64k x 64k x 1. However, unlike the threads, you can us all 64k in each of the two dimensions.

OK, one more question if you don’t mind: The overall maximum number of threads, is that given in pgaccelinfo by the “Max Threads Per SMP” field? If not, could you tell me where it is given? Thanks!

“Max Threads Per SMP” is the maximum number of threads that can be running on a given streaming multiprocessor (SMP) at a given time. So on my C1060, I can have 512 threads per block and 1024 threads per SMP. So if use all 512 threads in a block, two blocks can be running on the same SMP (assume no other limits are hit). If I only use 256 threads per block, 4 blocks can be running.

The maximum to total possible threads would the max threads per block multiplied by the max grid size. Though, not all of them would be running at the same time.

On the C1060 this would be 512 x 65535 x 65535 x 1, on a GTX690 it’s quite a bit larger at 1024 x 2147483647 x 65535 x 65535

