question in the sample code (simpleStream.cu)

frankyoung · November 23, 2007, 10:16am

global void init_array(int *g_data, int *factor)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
g_data[idx] = *factor; // uncoalesced on purpose to burn some time
}

The above codes are quoted from simpleStream.cu.

The comment says this is a uncoalesced memory access pattern but it seems to me that this is coalesced.

tid 0 for g_data[0], tid 1 for g_data[1],…, and g_data is of 4 bytes (int) length (aligned). I think this complies to the definition of coalesced access unless the base address of g_data is unaligned.

Can anyone help identify where I am wrong?

Thanks!

MisterAnderson42 · November 23, 2007, 3:03pm

You are correct: that write is coalesced unless blockDim.x is not a multiple of 32. I glanced at the simpleStreams code and it does seam that blockDim.x is a multiple of 32 so it is coalesced.

paulius · November 25, 2007, 12:13am

The reads and writes of data pointed to by g_data are coalesced. It’s the read of factor that’s not coalesced - all 16 threads in a halfwarp read the same address from global memory. The sample was designed to illustrate the benefit and use of streams, so everything (including) the kernel was kept really simple. Reading factor is uncoalesced so that the kernel takes more time and the benefit of overlap is observable (if you change factor to be passed an an int, rather than a pointer, you’ll see that the kernel time goes down significantly).

Paulius

frankyoung · November 26, 2007, 6:54am

Thanks.

I assume “factor” is a register variable. As according to the naming convention, factor should look like g_factor.