Hello all,
I have a problem understanding the ‘tex cache requests’ counter and the ‘Texture cache hit rate’ derived statistic in Visual Profiler 4.0. I run the following memory copy kernel that operates on a 3D cube of dimensions dimxdimydimz bound to a 1D texture
const int ix = blockIdx.xblockDim.x + threadIdx.x;
const int iy = blockIdx.yblockDim.y + threadIdx.y;
float i1 = ix+iy*pitch+texoffset;
uint kk;
for(kk=kstart-1; kk<kend+1; kk++){
out[ix + iypitch + kkpitchdimy] = tex1Dfetch(texData1D, i1);
i1 += pitchdimy;
}
Now the Visual profiler reports 50% texture cache hit ratio (!!) for a memory copy procedure - no data reuse. Moreover, if I compute the number of requested bytes by
tex_cache_requestsSM32
I get a value, which is twice larger than the domain I use, i.e., 2dimxdimydimzsizeof(float). On the other hand, the ‘tex_cache_misses’ counter reports exactly what is expected, i.e., half of tex_cache_requests.
The question is: why is tex_cache_request two times too large? Is the texture fetch instruction reissued on a texture cache miss, as I think is the case for the L1 cache miss? If so, why is this not counted as ‘replayed instructions’ statistic, which is 0 in my case?
I would appreciate any insights…
Thanks!
Marcin