Hello,
I am new in this field. I just started to run some examples and tried to understand the code. I would like to ask about how to access the allocated shared memory.
In the CUDA Programming Guide version 0.8 I have founded this way for allocate the memory and acces to it.
extern __shared__ short array0[ ]; const int size0= 128;
extern __shared__ float array0[ ]; const int size0= 128;
#define ARRAY0(i) (array0[i])
#define ARRAY1(i) (array1[(i)+((size0+1)*sizeof(array0[0]))/sizeof(array1[0])])
After I have seen this other way that is in the version 1.0 of the CUDA Guide.
extern __shared__ char array[];
__device__ void func() // __device__ or __global__ function
{
short* array0 = (short*)array;
float* array1 = (float*)&array0[128];
int* array2 = (int*)&array1[64];
}
I am using this last way for allocate the memory. However I am curios to know how it works the first manner, because I would like to know the arrangement of the shared memory.
Thanks.