hitting the grid size limitation

xargon · November 8, 2009, 2:01pm

Hello,

The grid size is limited to 65535 per dimension. In my code, I am hitting this limitation quite frequently and would like to know what the best way to work around it is. Should I try and split the kernel into multiple calls or is there some way to work around this limitation?

So currently, my code does something as follows:

[codebox]

dim3 B2(256,1,1);

dim3 G2(Grid_size,1,1); // The Grid_size can exceed beyond the allowed limit…

MyKernel<<<G2, B2>>>();

[/codebox]

My Grid_Size can extend beyong the 65535 limit/dimension. I do not really know how to seamlessly take advantage of the other dimensions. When I just try to set them to some other number, my kernel times out…

Thanks for any help you can give me.

/x

avidday · November 8, 2009, 2:29pm

It is explained in Section 2.2 of the programming guide, but the premise is exactly the same as column major ordered storage in arrays. The ID of a thread within a 3D block of dimensions(Dx,Dy,Dx) is:

dIdx = threadIdx.x + threadIdx.y*Dx + threadIdx.z*Dx*Dy

and the index of any given block in a grid of dimensions (Gx,Gy) is:

gIdx = blockIdx.x + blockIdx.y*Gx

therefore the “global” index of any thread is

Idx = dIdx + gIdx

That gives you 5126533565335 = 2185555059200 indices to work with.

Cygnus_X1 · November 8, 2009, 2:55pm

Just make sure, that when you make this change, to never ever refer to blockIdx.x anymore, but rather use your variable gIdx1.
Sometimes it can be a pain, if you have a device function somewhere which can be called from two kernels launched at different configuration settings.
Personally, I try to avoid using higher dimentions - even if it is a natural way to do so! - for exactly these reasons.

xargon · November 13, 2009, 1:40pm

Hello,

Thanks for the reply. Just a quick clarification though:

So, the thread ID in a 3D block can be gotten as:

[codebox]

const int tid = (threadIdx.x + threadIdx.yblockDim.x + threadIdx.zblockDim.x*blockDim.y);

[/codebox]

How do I get the dimensions of the grid though from a kernel? So, to get the index of a block I have:

[codebox]

const int bid_3D = blockIdx.x + blockIdx.y * (Grid_dimension_x);

[/codebox]

How do I get the dimensions of my grid from a kernel?

Many thanks,

/x

nbell · November 13, 2009, 3:36pm

Rather than using two dimensional grids you could have the threads in your kernel iterate. This example works for arbitrarily large N no matter what the block dimensions are.

[codebox]

global

void set_to_zero(float * ptr, unsigned int N)

{

const unsigned int grid_size = blockDim.x * gridDim.x;

unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;

while(i < N)

{

    ptr[i] = 0.0f;

i += grid_size;

}

[/codebox]

FWIW this is the strategy used in Thrust's algorithms.

Tigga · November 13, 2009, 3:46pm

The only time I’ve hit the max grid dimension I tried both the above methods and found that using a 2D grid was faster. I guess it depends a bit on your algorithm.

Topic		Replies	Views
Grid dimensions CUDA Programming and Performance	6	5451	September 18, 2009
2D grid and 1D Thread Block CUDA Programming and Performance	7	7227	August 21, 2008
grid dimensionality kernels CUDA Programming and Performance	11	10482	May 29, 2008
Size limitation for 1D Arrays in CUDA? CUDA Programming and Performance	9	18195	October 17, 2013
Problem about Grid-Block-Thread Dimension CUDA Programming and Performance	3	6320	July 23, 2021
Grid-Block-Thread Configuration CUDA Programming and Performance	3	3089	January 23, 2014
Max blocks per grid CUDA Programming and Performance	3	14707	August 3, 2009
Problems with maximum grid dimension CUDA Programming and Performance	2	616	October 16, 2018
Dimensions of a Block and a Grid CUDA Programming and Performance	7	12982	May 1, 2008
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10121	June 19, 2011

hitting the grid size limitation

Related topics