In relation to another post I have opened, I would like to understand how the two topic mentioned metrics are calculated. I wrote this small kernel
__global__ void test(
float * in, float2 *out)
{
int x=threadIdx.x;
float a=__ldg(in+x);
atomicAdd(&out[x].x,a); // out[x]=(float)(x*j);
atomicAdd(&out[x].y,a);
}
and executed over a P100 with 1 block and 32 threads (ie one warp)
The sass looks like
code for sm_60
Function : _Z4testPfP6float2
.headerflags @"EF_CUDA_SM60 EF_CUDA_PTX_SM(EF_CUDA_SM60)"
/* 0x083fc400e3e007f6 */
/*0008*/ MOV R1, c[0x0][0x20]; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_TID.X; /* 0xf0c8000002170000 */
/*0018*/ ISCADD R4.CC, R0.reuse, c[0x0][0x140], 0x2; /* 0x4c18810005070004 */
/* 0x001dc400fc4007ec */
/*0028*/ SHR R5, R0, 0x1e; /* 0x3829000001e70005 */
/*0030*/ IADD.X R5, R5, c[0x0][0x144]; /* 0x4c10080005170505 */
/*0038*/ LDG.E.CI R4, [R4]; /* 0xeed4a00000070404 */
/* 0x001f8800fd8207f1 */
/*0048*/ ISCADD R2.CC, R0.reuse, c[0x0][0x148], 0x3; /* 0x4c18818005270002 */
/*0050*/ SHR R0, R0, 0x1d; /* 0x3829000001d70000 */
/*0058*/ IADD.X R3, R0, c[0x0][0x14c]; /* 0x4c10080005370003 */
/* 0x001ffc001e2040f1 */
/*0068*/ RED.E.ADD.F32.FTZ.RN [R2], R4; /* 0xebf9000000370204 */
/*0070*/ RED.E.ADD.F32.FTZ.RN [R2+0x4], R4; /* 0xebf9000040370204 */
/*0078*/ EXIT; /* 0xe30000000007000f */
/* 0x001f8000fc0007ff */
/*0088*/ BRA 0x80; /* 0xe2400fffff07000f */
/*0090*/ NOP; /* 0x50b0000000070f00 */
/*0098*/ NOP; /* 0x50b0000000070f00 */
/* 0x001f8000fc0007e0 */
/*00a8*/ NOP; /* 0x50b0000000070f00 */
/*00b0*/ NOP; /* 0x50b0000000070f00 */
/*00b8*/ NOP; /* 0x50b0000000070f00 */
............................
When profiling I get this value as Hit rates
1 global_hit_rate Global Hit Rate in unified l1/tex 50.00% 50.00% 50.00%
1 tex_cache_hit_rate Unified Cache Hit Rate 83.33% 83.33% 83.33%
1 l2_tex_read_hit_rate L2 Hit Rate (Texture Reads) 0.00% 0.00% 0.00%
I can’t understand these rates. For example, why global_hit_rate is 50%. There is only one load, it should be 0%. AtomicAdds are clearly affecting the tex_cache_hit_rate but I can’t find any reason why it should be 83.33%
Any help to clear up the matter, really appreciated