hi,
What kind of problems could be if we have the next code??
global void myFirstKernel(int *d_a ) // int numBlocks = 6; int numThreadsPerBlock = 8; so we have 48 threads.
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
d_a[0] = idx;
for(int i = 0; i <2; i ++)
d_a[idx] = d_a[0];
}
Results are: d_a[0] = 45
d_a[1] = 45
d_a[2] = 46
d_a[3] = 46
d_a[4] = 47
d_a[5] = 47
…
d_a[32] = 23
d_a[33] = 47
d_a[34] = 47
…
d_a[47] = 47
why??? each thread has local memory and we don’t use shared memory.
thank you