I would like to dynamically allocate memory in shared memory however I am having difficulty doing this. I have followed the advice found here, CUDA, Supercomputing for the Masses: Part 5 | Dr Dobb's
But when I implement this method for two arrays, some information washes out.
In my cuda code I pass in a block size of 6 and a grid size of 4 (both 1D):
global void run_sim(float *a, float b, int buf_sz){
const int g_idx = (blockIdx.xblockDim.x+threadIdx.x+buf_sz);
const int l_idx = threadIdx.x+buf_sz;;
extern shared int s_data;
int * aT;
aT = &s_data[0];
aT[l_idx] = g_idx;
a[g_idx] = aT[l_idx];
}
And this code works as I expect, returning
[0 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0]
(buf_sz=2)
Now I try to create two array by modifying the code as follows:
global void run_sim(float *a, float b, int buf_sz){
const int g_idx = (blockIdx.xblockDim.x+threadIdx.x+buf_sz);
const int l_idx = threadIdx.x+buf_sz;;
extern shared int s_data;
int * aT;
aT = &s_data[0];
aT[l_idx] = g_idx;
a[g_idx] = aT[l_idx];
int * bT;
bT = &s_data[300]; //or desired size of a
bT[l_idx] = g_idx;
b[g_idx] = bT[l_idx];
}
returns b as expected, however now a is stored in global memory as:
[ 2 3 0 0 0 0 0 0 10 11 0 0 0 0 0 0 18 19 0 0 0 0 0 0]
The questions are:
- Does anyone know of a better way to dynamically allocate arrays in shared memory?
- Can anyone diagnose the above problem?
Thanks,
G