Max threads per block & Max thread dimensions


I am trying to do an example from the book “Cuda by Example”.

--------- This works fine when included with the rest of code ------------------

#define N 65535

add<<<((N/THREADS_PER_BLOCK)),THREADS_PER_BLOCK>>>(dev_a, dev_b, dev_c);

I ran another example from this book, which gave these numbers

Max threads per block: 512
Max thread dimensions: (512, 512, 64)
Max grid dimensions: (65535, 65535, 1)

How do these numbers ( Max threads per block, Max thread dimensions ,Max grid dimensions ) translate into these ( N, THREADS_PER_BLOCK ) ?

Does that mean, we can have N = 65535 * 65535, THREADS_PER_BLOCK = 512 * 512 * 64

How are Max threads per block, Max thread dimensions ,Max grid dimensions related ?


Maximum threads per block is 512 It may be arranged in any ways , as one dimensional , in this case maximum 512 , or two dimensional , then BxBy=512 is maximum , or three dimensional , BxBy*Bz=512 is maximum . Here maximum threads per dimension means , if we have a one dimensional block , it can have 512 elements and can be in Xdrection or Ydirecton . But if we have 3 dimensional block , the 3rd dimention should not exceed 64 , and obviously , the other dimensions are determined by the constraint of maximimum 512 .

So in principle , we can have 6553565535512 threads at a time .
I think you are considering only one dimensional blocks . So N=65535, threadsperblock=512 are the maxima.

Thank you.

I was trying another example

#define N (2048 * 2048)

add<<<((N/THREADS_PER_BLOCK)),THREADS_PER_BLOCK>> >(dev_a, dev_b, dev_c);

I am getting a stack overflow error. When I use N = 65535, it works fine.