break-in in Bandwidth test Texture Memory read

Hello

I do some bandwidthtests with texture memory. I measured some break-ins which are reproducible.

I read texture memory in two ways.

  • allways from another adress. (not cached)

  • allways from the same adress. (cached)

The kernels:

//copy data from texture memory to shared memory

 __global__ void read_only_tex_float(float* g_idata, float* g_odata, float c)

	{

	const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;

	__shared__ float shared[BLOCK_SIZE];

	shared[threadIdx.x] = tex1Dfetch(tex_float, idx);

	*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;

	}

 //copy data from texture memory (cache) to shared memory

 __global__ void read_only_tex_cashed(float* g_idata, float* g_odata, float c)

	{

	__shared__ float shared[BLOCK_SIZE];

	shared[threadIdx.x] = tex1Dfetch(tex_float, 0);

	*((float *)(&shared[(threadIdx.x + 1) & (BLOCK_SIZE-1)])) += 1.0;

	}

The results of the 3 measurements are in the attachment.

Now my question:

Whats the reasons for the reproducible break-ins?

greetings Lanzelot
TextureReadx.pdf (62.5 KB)