Problem with accessing array of pointers Array of pointers

Hi,

When an array of pointers is accessed using an auto variable (threadIdx.z} to access shared memory, I find that the results are incosistent for each of the runs. I find that data from a different blockId is fetched instead of the one it is suposed to. This problem goes away if I use a switch statement that explicitly maps each of the indexes to it’s pointer.

For e.g.
shared unsigned char *pcharray[4];
shared unsigned char buf1[100], buf2[100], buf3[100], buf4[100];

pcharray [0] = buf1;
pcharray [1] = buf2;
pcharray [2] = buf3;
pcharray [3] = buf4;

inPtr = pcharray [threadIdx.z]; /* this does not work */


/where as if we write it as a case statement as given below it works/
switch (threadIdx.z)
{

case 0:
inPtr = pcharray [0];
break;
case 1:
inPtr = pcharray [1];
break;
case 2:
inPtr = pcharray [2];

break;

case 3:
inPtr = pcharray [3];

break;
}

Appreciate if the experts comment on why addressing the pointer array using threadIdx.z does not work, thank you

Hi,

When an array of pointers is accessed using an auto variable (threadIdx.z} to access shared memory, I find that the results are incosistent for each of the runs. I find that data from a different blockId is fetched instead of the one it is suposed to. This problem goes away if I use a switch statement that explicitly maps each of the indexes to it’s pointer.

For e.g.
shared unsigned char *pcharray[4];
shared unsigned char buf1[100], buf2[100], buf3[100], buf4[100];

pcharray [0] = buf1;
pcharray [1] = buf2;
pcharray [2] = buf3;
pcharray [3] = buf4;

inPtr = pcharray [threadIdx.z]; /* this does not work */


/where as if we write it as a case statement as given below it works/
switch (threadIdx.z)
{

case 0:
inPtr = pcharray [0];
break;
case 1:
inPtr = pcharray [1];
break;
case 2:
inPtr = pcharray [2];

break;

case 3:
inPtr = pcharray [3];

break;
}

Appreciate if the experts comment on why addressing the pointer array using threadIdx.z does not work, thank you

seems ok, are you sure you only have four threads? What are the others doing?
Do you need synchronisation between your threads.
Have you explcitly told the kernal how much shared memory to allocate
<<<grid_size,block_size,shared_bytes>>>
Have you told it where to place bufX, eg buf2 = &buf1[100];

seems ok, are you sure you only have four threads? What are the others doing?
Do you need synchronisation between your threads.
Have you explcitly told the kernal how much shared memory to allocate
<<<grid_size,block_size,shared_bytes>>>
Have you told it where to place bufX, eg buf2 = &buf1[100];

Does it work with explicit synchronization?

__shared__ unsigned char *pcharray[4];

__shared__ unsigned char buf1[100], buf2[100], buf3[100], buf4[100];

pcharray [0] = buf1;

pcharray [1] = buf2;

pcharray [2] = buf3;

pcharray [3] = buf4;

__syncthreads();

inPtr = pcharray [threadIdx.z]; /* this does not work */

Your code looks ok if pcharray is never again written to within the kernel. However I can imagine that the compiler might get confused by the missing synchronizaton.

On a side note, why not use a single buffer?

__shared__ unsigned char buf[400];

inPtr = buf + 100 * threadIdx.z;

Does it work with explicit synchronization?

__shared__ unsigned char *pcharray[4];

__shared__ unsigned char buf1[100], buf2[100], buf3[100], buf4[100];

pcharray [0] = buf1;

pcharray [1] = buf2;

pcharray [2] = buf3;

pcharray [3] = buf4;

__syncthreads();

inPtr = pcharray [threadIdx.z]; /* this does not work */

Your code looks ok if pcharray is never again written to within the kernel. However I can imagine that the compiler might get confused by the missing synchronizaton.

On a side note, why not use a single buffer?

__shared__ unsigned char buf[400];

inPtr = buf + 100 * threadIdx.z;