Illogical Bandwidth results Read and Write Globalmemory

Hello

I did some bandwidthtests with global memory.

There are 3 different tests:

  • read only

  • write only

  • read and write

I supposed the “read and write” test would be the avarage of the “read only” and “write only”. The result was another: “read and write” was match more faster than “read only” or “write only”. The bandwidth of the “read and write” I multiplied with 2 because there are 2 transfers. The datatype in all measurements is float.

The kernels:

//copy data from global to global memory (global to global or "read and write")

template <class T> __global__ void copy_gmem(T* g_idata, T* g_odata, T c)

	{

	const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;

	g_odata[idx] = g_idata[idx];

	}

//copy data from global to shared memory (read only)

template <class T> __global__ void read_only_gmem(T* g_idata, T* g_odata, T c)

	{

	const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;

	__shared__ T shared[BLOCK_SIZE];

	shared[threadIdx.x] = g_idata[idx];

	*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;

	}

//writes a constant to the global memory (write only)

template <class T> __global__ void write_only(T* g_idata, T* g_odata, T c)

	{

	const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;

	g_odata[idx] = c;

	}

The results are in the attachment.

Question:

What is the reason for the higher bandwidth in the “read and write” test? (Why it is not the average of “read only” and “write only”)

greetings lanzelot
globalTest2.JPG