Limit to Number of Blocks? Noob Question

Craig_Moore · May 15, 2008, 4:00pm

I’ve got a question regarding what the limits are in regard to blocks of threads.

When you launch a global function, you need to specify the number of blocks and threads you’ll be using. I’m designing my program so that it can run an unknown number of simulations. Each simulation is carried out by a thread. Since, there is a limited amount of memory, I would need to break up the simulation into manageable chunks. After trial and error, I arrive at the following calculations from the occupancy calculator:

I can see from this spread sheet that it makes no mention of the number of blocks, when you enter the usage. Why doesn’t the spreadsheet take that into account. Does the scheduler on the GPU schedule kernel invocations of blocks somehow? Say if 64 blocks will all fit on the GPU and each block contains 384 threads (or 24,576 simulations in total), will the GPU schedule 5 sets of 64 blocks through the GPU. In other words, if the number of blocks is larger than the available resources, does it manage this?

What I’ve been doing is choosing a number of threads based on the occupancy calculator and plugging that into my program. Then I calculate the number of blocks that will be required based on the number of simulations I need to run. Here’s the relevant code:

 /* Based on Cuda occupancy calculator, 

     which allows for 50% occupancy */

  const int numThreads = 384;

  int numSimulations;

  int numBlocks;

  readFile (fp, &numSimulations); 

  /* Proportional to the number of simulations */

   numBlocks = ceil ( (float) numSimulations / (float) numThreads );

   /* Device Grid and Block Size */

    dim3 grid (numBlocks);

    dim3 threads (numThreads);

   /* Allocate shared memory size */

    sharedMemorySize = 2 * sizeof(float) * nr;

   photonSim<<< grid, threads, sharedMemorySize>>> (d_Rr, d_Tr, d_randNumArray);

I guess I just need a sanity check, does this work? Can you have unlimited blocks, but limited threads?

If not, then should I have setup a loop with something like this?

for ( numBlocks = 64; numBlocks < 320; numBlocks += 64)

{

    dim3 grid (numBlocks);

    dim3 threads (numThreads);

   /* Allocate shared memory size */

    sharedMemorySize = 2 * sizeof(float) * nr;

   aSim<<< grid, threads, sharedMemorySize>>> (d_Rr, d_Tr, d_randNumArray);

}

If there is a limit to the number of blocks, then how can I calculate it?

Thanks for your help,

Craig

MisterAnderson42 · May 15, 2008, 4:10pm

You can run up to 65535*65535 blocks. See Appendix A in the programming guide, this is the hardware limit of the block dimensions.

The occupancy calculator doesn’t take into account the number of blocks because it assumes that you have launched enough blocks to saturate all the multiprocessors. Up to 8 blocks can run concurrently on a single multiproccessor (assuming the register and shared mem usage allows it).

On the device, all blockDim.x*blockDim.y blocks start in a waiting queue and the multiprocessors are filled up with running blocks, removing them from the queue. As soon as one block completes, its space is immediately filled with another block from the queue.

Craig_Moore · May 15, 2008, 4:23pm

Thanks very much! External Image

Quoc_Vinh · May 16, 2008, 8:27am

Me. I think that the maximum blocks on a grid is 65535. if you use more than 65535 (example 65536) the kernel function would’n launch

E.D_Riedijk · May 16, 2008, 9:44am

You can run 65535 * 65535 blocks. So you would have to launch a 2D-grid. Neither of the 2 dimensions are allowed to surpass 65535.

Topic		Replies	Views
how to determine max number of blocks per kernel CUDA Programming and Performance	10	17218	September 11, 2011
How determine max number of blocks and threads for a GPU? CUDA Programming and Performance	4	20846	December 13, 2018
CUDA software and hardware mapping CUDA Programming and Performance	5	14681	February 21, 2009
max number of block CUDA Programming and Performance	21	17648	April 20, 2010
How many can use Blocks to effcient parallel prog CUDA Programming and Performance	8	5789	December 12, 2009
Questions about Block and Grid CUDA Programming and Performance	4	3548	February 26, 2008
Question regarding maximum amount of blocks CUDA Programming and Performance	2	793	January 28, 2011
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27691	February 15, 2010
the maximum number of blocks and threads CUDA Programming and Performance	10	7012	September 4, 2008
Maximum number of threads in a GPU CUDA Programming and Performance cuda	5	6322	December 29, 2022

Limit to Number of Blocks? Noob Question

Related topics