Does addressing for shared memory in Fermi start at 0. On previous hardware I have done something like the following test case which allows me to pad a set of structures in shared memory.
#define PADDING 0
#define OFFSET 0
#define SM_INDEX(i, s) ((i*(s+PADDING))+OFFSET)
__global__ void testKernel(int* g_odata)
{
int index = threadIdx.x+(blockIdx.x*blockDim.x);
extern __shared__ char sdata[];
int smIndex = SM_INDEX(threadIdx.x, sizeof(info));
info* s = (info*) &sdata[smIndex];
//pass
if (s){
g_odata[index] = 1;
}
//fail
else
{
g_odata[index] = 0;
}
}
On all my previous CUDA hardware this has been fine however on my current set up Windows 7 x64 with GTX480 and Cuda 3.0, threadIdx.x of every block always fails suggesting that memory addressing starts at 0x000000f. Is this a result of kernel arguments being stored in constant memory rather than shared memory as in previous architectures? i.e. address 0 was previously unavailbale. If I set the OFFSET to any value greater than 0 this passes no problem.
Is this the correct behaviour or a bug. I guess its obvious that memory addressing should start at 0 but this is usually reserved to indicate a NULL pointer.
Can anyone shed any light on this.
Thanks
Paul