Have written a small pgm,
global void resident_evil ( float *d_A, float d_B)
{
int idx = blockIdx.xblockDim.x + threadIdx.x;
__shared__ float smemA[BLOCK_SIZE];
smemA[idx] = d_A[idx]; // This Copy is deliberately done so to produce below behaviour
__syncthreads();
smemA[idx] +=10;
d_B[idx] = smemA[idx];
}
and called with
resident_evil <<< 5,192 >>> (dev_A, dev_B);
and The answer is
rslt[187]=197.000000
rslt[188]=198.000000
rslt[189]=199.000000
rslt[190]=200.000000
rslt[191]=201.000000
rslt[192]=202.000000
rslt[193]=203.000000
rslt[194]=204.000000
rslt[195]=205.000000
rslt[196]=206.000000
rslt[197]=207.000000
rslt[198]=208.000000
rslt[199]=209.000000
rslt[200]=210.000000
rslt[201]=211.000000
rslt[202]=10.000000
rslt[203]=10.000000
The Question is “How the underline parts came up?”, Why have not it stops at 191 position? You guys must be having some insight?