Block size and grid size

n0mad · April 25, 2009, 5:28pm

Hello,

I just start learning CUDA and I’m confuse with block/grid size. “standard” C programming with pthreads or fork() is not very difficult for me, but I don’t really understand the CUDA architecture. So I have few questions…

I don’t understand how to set the right block size, the right grid size and of course the right number of threads! Is this related to hardware and/or the application ?
Can we see GPU threads as CPU threads ? or the number of blocks as the number of CPU threads ?
Do you have information/papers/links “comparing” the CPU approach vs the GPU approach ?

My card is a GeForce 9300M GS 512MB.

I don’t have so much mathematics backgrounds… :o)

Thanks
(yes, that’s newbie questions =) )

navier-stokes · April 25, 2009, 5:45pm

The block and grid sizes depend on the one hand on your algorithm. On the other hand there are some restrictions due to hardware resources. Besides that you have to take care for optimal load on the GPU.

GPU and an x86 CPU are complete different architectutres. Thus CPU threads aren’t comparable to GPU threads at all.

Refer to the CUDA Programming Guide and the CUDA Technical Training

External Media

n0mad · April 27, 2009, 5:29pm

Hello,

I start to understand the CUDA logic…but I have a question :

Let’s assume we want to compute a matrix 16x48 (let’s say add 1 at each element). So my matrix has 768 elements. I also know, according to the cuda programming guide, that a block can only handle 512 threads max and blocks from a grid are distributed on multiprocessors (MP).

So if I have only 1 MP, nothing to think about, I launch my kernel has follow :

func <<< 1, dim3(16,48) >>>(...)

However, if I have a card with 30 MP do I need to launch :

func <<<1, dim3(16,32) >>> (..)

for better performances? By better performance I mean a faster resolution time. (I’m thinking of making blocks which have 512 threads each).

Am I right ? (-:

Thanks,

navier-stokes · April 27, 2009, 5:54pm

One Block is executed by only one MP. BUT that does not mean that one MP can only process one block.

…

So if I have only 1 MP, nothing to think about, I launch my kernel has follow :
func <<< 1, dim3(16,48) >>>(...)
However, if I have a card with 30 MP do I need to launch :
func <<<1, dim3(16,32) >>> (..)
for better performances? By better performance I mean a faster resolution time. (I’m thinking of making blocks which have 512 threads each).

Am I right ? (-:

…

External Media

<<<dimGrid, dimBlock>>> is the total bunch of threads that is distributed among all MPs. It is not the Grid-Per-MP.

Regards

Navier

n0mad · April 27, 2009, 6:05pm

OK ok ok… if I launch a kernel like that :

dim3 dimBlock(16,48)

func <<< 1, dimBlock >>> (....)

I’m creating 1 grid (one by one) containing X blocks 16x48x1 isn’t it ? Where X is determined by…I don’t know =) How can I know ? does it depend on the data to be processed ?

I mean : if I want to processed a 768 elements array and I launch the kernel as follow :

dim3 dimBlock(16,32)

func <<< 1, dimBlock >>> (myArray)

does it create 2 blocks ? (16x32 = 512 => 2x512 = 1024 > 768 => 2 blocks)

Help ! =)

jph4599 · April 27, 2009, 7:28pm

You should think about it in 1D first…

Suppose you want to process 768 elements, each with their own thread.

You could also do 3 blocks, with each block having 256 threads…something like

numBlocks = 3;

numThreadsPerBlock = 256;

dim3 dimGrid(numBlocks);

dim3 dimBlock(numThreadsPerBlock);

func <<< dimGrid, dimBlock >>> (myArray)

Once you have an understanding of kernel execution parameters, GPU architecture and CUDA as a whole, then I’d suggest looking into changing your block/grid dimensions…

Topic		Replies	Views
help with some cuda programming CUDA Programming and Performance	9	1817	August 31, 2009
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27455	February 15, 2010
General CUDA Questions New to CUDA and need some help! CUDA Programming and Performance	8	5973	September 5, 2008
Questions about Block and Grid CUDA Programming and Performance	4	3542	February 26, 2008
Grids and Threads question CUDA Programming and Performance	2	4421	August 7, 2007
grid size, block size CUDA Programming and Performance	2	24928	February 5, 2009
trouble learning how to set block and max thread size CUDA Programming and Performance	4	1957	January 26, 2011
Block/threads and stuff... CUDA Programming and Performance	5	4901	September 12, 2008
Question about Block and Thread Organization dimBlock.x, dimBlock.y, dimGrid, dimBlock CUDA Programming and Performance	9	14601	April 22, 2012
Grid dimension's decision How to take decision for organization of a grid . CUDA Programming and Performance	6	5450	March 10, 2009

Block size and grid size

Related topics