I have some rendering kernel called this way:
width = 320;
dim3 GridDimension = dim3(32, 16);
dim3 BlockDimension = dim3((width+GridDimension.x-1)/GridDimension.x,
KDKernel<<<GridDimension, BlockDimension>>>((unsigned char *)surface,
(uint4 ) NodesMemory,
(unsigned int) IndicesMemory,
(float4 *) TrianglesMemory);
And all works fine on my G80, but if i change GridDimension to be
dim3 GridDimension = dim3(16, 16);
My app crashes - why ?
(inside kernel i check to not overwrite memory - this is not the case)
How the addresses of global memory parameters are passed to the individual threads
in kernel ? Via shared memory ?
so in first case (grid = 32x16) i should need 10154*4=2400 bytes of shared memory
so in second case (grid = 16x16) i should need 20154*4=4800 bytes of shared memory
My app is x64 bit app, so those addresses are 8 bytes long each ?
or they are still 4 bytes long as GPU operates on 32bit address space - there is nothing on it in doc’s.
Even if they are 8 bytes lon each, my parameters should occupy 9600 of shared memory, so i still dont know why my kernel crashes
(inside kernel there is no any shared memory usage via shared directive)
any ideas ?