Hello
I do some bandwidthtests with texture memory. I measured some break-ins which are reproducible.
I read texture memory in two ways.
-
allways from another adress. (not cached)
-
allways from the same adress. (cached)
The kernels:
//copy data from texture memory to shared memory
__global__ void read_only_tex_float(float* g_idata, float* g_odata, float c)
{
const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
__shared__ float shared[BLOCK_SIZE];
shared[threadIdx.x] = tex1Dfetch(tex_float, idx);
*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;
}
//copy data from texture memory (cache) to shared memory
__global__ void read_only_tex_cashed(float* g_idata, float* g_odata, float c)
{
__shared__ float shared[BLOCK_SIZE];
shared[threadIdx.x] = tex1Dfetch(tex_float, 0);
*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;
}
The results of the 3 measurements are in the attachment.
Now my question:
Whats the reasons for the reproducible break-ins?
greetings Lanzelot
TextureReadx.pdf (62.5 KB)