I was recently going through the reduction sample and came across this function:
////////////////////////////////////////////////////////////////////////////////
// Compute the number of threads and blocks to use for the reduction
// We set threads / block to the minimum of maxThreads and n/2.
////////////////////////////////////////////////////////////////////////////////
void getNumBlocksAndThreads(int n, int maxBlocks, int maxThreads, int &blocks, int &threads)
{
if (n == 1)
{
threads = 1;
blocks = 1;
}
else
{
threads = (n < maxThreads*2) ? nextPow2(n / 2) : maxThreads;
blocks = max(1, n / (threads * 2));
}
blocks = min(maxBlocks, blocks);
}
Isn’t the number of blocks computed too less? Shouldn’t the number of blocks be (1 + n / threads)?