global void init_array(int *g_data, int *factor)
int idx = blockIdx.x * blockDim.x + threadIdx.x;
g_data[idx] = *factor; // uncoalesced on purpose to burn some time
The above codes are quoted from simpleStream.cu.
The comment says this is a uncoalesced memory access pattern but it seems to me that this is coalesced.
tid 0 for g_data, tid 1 for g_data,…, and g_data is of 4 bytes (int) length (aligned). I think this complies to the definition of coalesced access unless the base address of g_data is unaligned.
Can anyone help identify where I am wrong?