Texture fetches and computeprof counters What does the 'tex cache requests' counter really m

Hello all,

I have a problem understanding the ‘tex cache requests’ counter and the ‘Texture cache hit rate’ derived statistic in Visual Profiler 4.0. I run the following memory copy kernel that operates on a 3D cube of dimensions dimxdimydimz bound to a 1D texture

const int ix = blockIdx.xblockDim.x + threadIdx.x;
const int iy = blockIdx.y
blockDim.y + threadIdx.y;
float i1 = ix+iy*pitch+texoffset;
uint kk;

for(kk=kstart-1; kk<kend+1; kk++){
out[ix + iypitch + kkpitchdimy] = tex1Dfetch(texData1D, i1);
i1 += pitch

Now the Visual profiler reports 50% texture cache hit ratio (!!) for a memory copy procedure - no data reuse. Moreover, if I compute the number of requested bytes by


I get a value, which is twice larger than the domain I use, i.e., 2dimxdimydimzsizeof(float). On the other hand, the ‘tex_cache_misses’ counter reports exactly what is expected, i.e., half of tex_cache_requests.

The question is: why is tex_cache_request two times too large? Is the texture fetch instruction reissued on a texture cache miss, as I think is the case for the L1 cache miss? If so, why is this not counted as ‘replayed instructions’ statistic, which is 0 in my case?

I would appreciate any insights…