A simple kernel:
int tid = threadIdx.x+blockDimx.x*blockIdx.x;
case 1: B[tid]=tex1Dfetch(tex_A,tid);
case 2: B[tid]=A[tid];
In these two cases, there should be no any temporal reuses for array A(texture reference is tex_A)
I just used nvprof to see the # of L1 texture cache sector queries for case 1 = 8*case 2(# of L1 cache reads), and the texture cache hit is almost 50.00%. For case 2, cache hit rate is 0.00%
I understand that the texture cache sector query is added by one for one 32 bytes access, L1 cache reads request is added by one for one 128 bytes access. In this case, L1 texture cache sector queries for case 1 should be 4case 2. I know the additional 4case 2 (8case 2 - 4case 2) is the reason for the 50.00% texture cache hit rate.
Can any one help explain why? and how tex1Dfetch load data into cache? thanks.