When an array of pointers is accessed using an auto variable (threadIdx.z} to access shared memory, I find that the results are incosistent for each of the runs. I find that data from a different blockId is fetched instead of the one it is suposed to. This problem goes away if I use a switch statement that explicitly maps each of the indexes to it’s pointer.
For e.g. shared unsigned char *pcharray[4]; shared unsigned char buf1[100], buf2[100], buf3[100], buf4[100];
When an array of pointers is accessed using an auto variable (threadIdx.z} to access shared memory, I find that the results are incosistent for each of the runs. I find that data from a different blockId is fetched instead of the one it is suposed to. This problem goes away if I use a switch statement that explicitly maps each of the indexes to it’s pointer.
For e.g. shared unsigned char *pcharray[4]; shared unsigned char buf1[100], buf2[100], buf3[100], buf4[100];
seems ok, are you sure you only have four threads? What are the others doing?
Do you need synchronisation between your threads.
Have you explcitly told the kernal how much shared memory to allocate
<<<grid_size,block_size,shared_bytes>>>
Have you told it where to place bufX, eg buf2 = &buf1[100];
seems ok, are you sure you only have four threads? What are the others doing?
Do you need synchronisation between your threads.
Have you explcitly told the kernal how much shared memory to allocate
<<<grid_size,block_size,shared_bytes>>>
Have you told it where to place bufX, eg buf2 = &buf1[100];
__shared__ unsigned char *pcharray[4];
__shared__ unsigned char buf1[100], buf2[100], buf3[100], buf4[100];
pcharray [0] = buf1;
pcharray [1] = buf2;
pcharray [2] = buf3;
pcharray [3] = buf4;
__syncthreads();
inPtr = pcharray [threadIdx.z]; /* this does not work */
Your code looks ok if pcharray is never again written to within the kernel. However I can imagine that the compiler might get confused by the missing synchronizaton.
__shared__ unsigned char *pcharray[4];
__shared__ unsigned char buf1[100], buf2[100], buf3[100], buf4[100];
pcharray [0] = buf1;
pcharray [1] = buf2;
pcharray [2] = buf3;
pcharray [3] = buf4;
__syncthreads();
inPtr = pcharray [threadIdx.z]; /* this does not work */
Your code looks ok if pcharray is never again written to within the kernel. However I can imagine that the compiler might get confused by the missing synchronizaton.