Dimensions of a Block and a Grid

Alark · April 28, 2008, 9:18pm

Hello,

From what I have understood by playing around with CUDA, it seems that one can have 3-dimensional threadId’s defined below as

dim3 threadBlock(8, 8, 8);

A block dimensions for it can be defined as
dim3 kernelBlockGrid(1, 1, 1);

If I change the last dimension of kernelBlockGrid to anything other than 1, I get the following error:

cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/execute.cu, line 1038
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/execute.cu, line 297
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 119
cufft: ERROR: CUFFT_EXEC_FAILED

As far as I understand, when I have something like

myKernel<<<kernelBlockGrid, threadBlock>>> …

in my program, kernelBlockGrid defines the dimensions of the Grid and threadBlock defines the layout of the threads in the block. If the maximum number of the threads per block is 512, how does having maximum block dimensions of 51251264 help? Maybe a scenario in which this will be helpful will help me understand it a bit better.
Thanks.

Regards,

-Alark

seibert · April 29, 2008, 2:34am

According to the Programming Guide (Sec. 2.2.2, pg. 8 of the CUDA 2.0 manual), grids can only be 2D, not 3D. So you aren’t allowed to set the last component of kernelBlockGrid to anything but 1, despite the fact that it is a dim3 variable. (Maybe someday CUDA hardware will support 3D grid dimensions.)

Alark · April 29, 2008, 3:08am

Thanks for your help. I had found that the grid dimensions were 2D. I was wondering if there was any way to go around it. I am experimenting with setting Y-dim = Y-dim * Z-dim and then accessing Y-dim in the kernel accordingly to try and ‘extract’ a z-dim from it. I have threadIdx.z on the gpu, I just need some way to get a z-id for a grid. I am trying to a 3D operation on a 3D-matrix :| Please let me know if you have a better way of doing it. Thanks.

What I am still a bit confused about is the block dimensions? I understand that one can have 512 threads per block and we specify grid dimensions and number of threads per block when invoking the kernel. Then what is the benefit of having max block dimensions of 51251264? :|

-Alark

mfatica · April 29, 2008, 3:24am

The product of threadBlock.x * threadBlock.y *threadBlock.z needs to be <=512 with the constraint that threadBlock.x<=512, threadBlock.y <=512 and threadBlock.z<=64.

So you can have block like (512,1,1) or (1,512,1) or (2,4,64).
A block (1,1,512) is not valid.

Alark · April 29, 2008, 1:24pm

Thanks. That helps a lot.

Any suggestion on how I could perform operations on a 3D array? I’m trying to cast it as a 2D problem but for some reason in the kernel, the x, y, z indices into the 3D texture are incorrect. Thanks.

Regards,

-Alark

moulik · April 29, 2008, 3:54pm

You can try the following… it is not sexy, but it works

tx = dimension_x * dimension_y

ty = dimension_z

y = truncf(tx/dimension_y);

x = tx - y*dimension_y;

z = ty;

now use the xyz for your texture fetch from a linear array([x+y*x_dimension…])

or, you could wait for someone to give you a less naive answer…

Alark · April 29, 2008, 8:56pm

Thanks for that. I had something similar in ty (instead of tx as per your suggestion). I tried your suggestion too and it clearly iterates the x, y, z indices correctly.

My grid and thread block setup is as follows: (just in case i’m doing something wrong)

dim3 threadBlockData(8 * 8, 8, 1);

// This is using the convolution2D demo.

dim3 dataBlockGrid(iDivUp(FFT_W , threadBlockData.x ), iDivUp(FFT_D, threadBlockData.y), 1);

padData<<<dataBlockGrid, threadBlockData>>> ..

So I am actually trying to do 3d convolution on a 3d array. I feel like i’m close since some answers are correct, but most just look like they’re off by a bit. I am comparing the gpu output with that of the cpu convolution. I too would have thought that a neat indexing with 3d cufft would do the trick. Any suggestions? :| Thanks.

Regards,

Alark

asherimtiaz · May 1, 2008, 4:12pm

Hi nvidia,

i was testing different configuration on cuda before using one. It is confirmed from my program’s output that threads should be less than equal to 512 but there are some configurations which looks valid but are not working.

for example

Config: By=1, Bx=1, Ty=512, Tx=1 : Not Valid

Config: By=1, Bx=1, Ty=32, Tx=1 : Not Valid

Config: By=1, Bx=1, Ty=1, Tx=1 : Not Valid

Config: By=1, Bx=1, Ty=1, Tx=2 : Blocks=1 Threads=2 Valid

Config: By=1, Bx=1, Ty=1, Tx=4 : Blocks=1 Threads=4 Valid

Config: By=1, Bx=1, Ty=1, Tx=8 : Blocks=1 Threads=8 Valid

Config: By=1, Bx=1, Ty=1, Tx=16 : Blocks=1 Threads=16 Valid

Config: By=1, Bx=1, Ty=1, Tx=32 : Blocks=1 Threads=32 Valid

Config: By=1, Bx=1, Ty=1, Tx=64 : Blocks=1 Threads=64 Valid

Config: By=1, Bx=1, Ty=1, Tx=128 : Blocks=1 Threads=128 Valid

Config: By=1, Bx=1, Ty=1, Tx=256 : Blocks=1 Threads=256 Valid

Config: By=1, Bx=1, Ty=1, Tx=512 : Blocks=1 Threads=512 Valid

(where Tz = 1 in all cases)

Either my test program is giving incorrect results or there is any other technical reason for that. (my test program file is attached)

kindly help me with your comments.

thanks

-Asher
configTest.txt (4.3 KB)

Topic		Replies	Views
How to realise 3 dimension calculation? CUDA Programming and Performance	9	243	August 28, 2023
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10106	June 19, 2011
what does the "dimension" of a block(or grid) mean some basic idea of CUDA CUDA Programming and Performance	5	6570	October 25, 2010
Grid dimensions CUDA Programming and Performance	6	5415	September 18, 2009
The 3rd dimension can't be greater than 1? CUDA Programming and Performance	8	7297	September 23, 2011
Thread Number Limitation CUDA Programming and Performance	3	3887	December 22, 2008
Size limitation for 1D Arrays in CUDA? CUDA Programming and Performance	9	18144	October 17, 2013
Problem about Grid-Block-Thread Dimension CUDA Programming and Performance	3	6312	July 23, 2021
Whats wrong with this simple kernel call? Invalid Configuration Argument (with empty Kernel) CUDA Programming and Performance	16	9035	November 23, 2009
Maximum number of threads on thread block CUDA Programming and Performance	12	72126	September 21, 2023

Dimensions of a Block and a Grid

Related topics