Hello
I did some bandwidthtests with global memory.
There are 3 different tests:
-
read only
-
write only
-
read and write
I supposed the “read and write” test would be the avarage of the “read only” and “write only”. The result was another: “read and write” was match more faster than “read only” or “write only”. The bandwidth of the “read and write” I multiplied with 2 because there are 2 transfers. The datatype in all measurements is float.
The kernels:
//copy data from global to global memory (global to global or "read and write")
template <class T> __global__ void copy_gmem(T* g_idata, T* g_odata, T c)
{
const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
g_odata[idx] = g_idata[idx];
}
//copy data from global to shared memory (read only)
template <class T> __global__ void read_only_gmem(T* g_idata, T* g_odata, T c)
{
const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
__shared__ T shared[BLOCK_SIZE];
shared[threadIdx.x] = g_idata[idx];
*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;
}
//writes a constant to the global memory (write only)
template <class T> __global__ void write_only(T* g_idata, T* g_odata, T c)
{
const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
g_odata[idx] = c;
}
The results are in the attachment.
Question:
What is the reason for the higher bandwidth in the “read and write” test? (Why it is not the average of “read only” and “write only”)
greetings lanzelot