Profiler shows zero texture cache activity though I use texture memory

Here’s my code:

texture<ushort, 1, cudaReadModeElementType> g_TexRef;

const int size = sizeof(ushort)*width*height;
cudaMalloc( &gpuSrc.elements, size );
cudaChannelFormatDesc ushortDesc = cudaCreateChannelDesc<ushort>();
cudaBindTexture (0, g_TexRef, gpuSrc.elements , ushortDesc, size);

///And in kernel:
__device__ void GetMaxLoc(Matrix A, int* maxX, int* maxY, ushort* max) {
	for (int y = 0; y<A.height; y++) {
		for (int x = 0; x<A.width; x++) {
			ushort elm = tex1Dfetch(g_TexRef, A.offset + y*A.stride+x);
			//...
		}
	}
}

However Visual profiler shows zero texture cache activity as shown on the screenshot. I wonder if there is something wrong with the code or it is a profiler issue?

I got GeForce GT 620M, Fermi.

Found out what is wrong. Most likely that NV VisProfiler uses tex_cache_sector_queries event to measure this metric.

tex_cache_throughput 	Texture cache throughput 	tex_cache_sector_queries * 32 / gputime

But

nvprof --query-events

clearly states that there’s no such event for my system. Instead I have tex0_cache_sector_queries and tex1_cache_sector_queries.

Check events/metrics available in your system.