I want to verify the L1/tex cache and L2 cache on my Maxwell GPU. For two simple kernels;
global void init(int n, float x, float y)
{
int index= threadIdx.x+blockDim.xblockIdx.x;
int stride= blockDim.xgridDim.x;
for(int i=index;i<n;i+=stride)
{
x[i]=1.0f;
y[i]=2.0f;
}
}
global
void add(int n, float *x, float y)
{
int index=threadIdx.x;
int stride=blockDim.xgridDim.x;
for(int i= index;i<n;i+=stride)
y[i] = x[i] + y[i];
}
The nvprof gives results shown below:
Invocations Metric Name Metric Description Min Max Avg
Device “GeForce GTX TITAN X (0)”
Kernel: init(int, float*, float*)
1 global_hit_rate Global Hit Rate in unified l1/tex 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 50.00% 50.00% 50.00%
Kernel: add(int, float*, float*)
1 global_hit_rate Global Hit Rate in unified l1/tex 0.00% 0.00% 0.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 50.00% 50.00% 50.00%
The global_hit_rate is zero. I searched some explanations that said it is because the L1 cache is not used by default in Maxwell. Then why the tex_cache_hit_rate is not zero? Could someone help me clarify the puzzling metrics? Any idea is appreciated. Thank you very much