From Programming Guide 4.0
Maximum x-, y-, or z-dimension of a grid of thread blocks = 65535 [CC 1.0 - 2.x]
Maximum width and height for a 2D texture reference bound to linear memory or to a CUDA array = 65536 x 32768 [CC 1.0 - 1.3], 65536 x 65535 [CC 2.x]
I have no Idea how this is implemented in Hardware but i am just curious why those values are 65535 instead of 65536 ?, especially the 2nd one, where it the height at [CC 1.0 - 1.3] is a power of 2 and for Compute Capability it is not ?
I know from older Programming Guides, a few values were changed from 66535 to 65536!
with 2 bytes you can go 0 to 65535 not 65536
those reference are based on 1-indexing! so it should be from 1 to 65536, you cannot have 0 blocks in a dimension,
With a block size of 65535 you can have indices from 1 to 65535!
aswell you cannot have an array with width or height of 0.
you cant have
square_array <<< dim3 (65536,5,1),dim3(1,16,1) >>> (memoiregraphique1, f,N);
but you can have 2048*32 =65536
square_array <<< dim3 (2048,5,1),dim3(32,16,1) >>> (memoiregraphique1, f,N);
int sh= threadIdx.x+threadIdx.y*32;
int id = 512*blockIdx.x + 1048576* blockIdx.y + sh;
and have a id from 0 to 5242880