I’m using nvprof to profile my CUDA code. The metric that I care about is how many clock cycles that the SM is totally idle. So, I measured three metrics using nvprof, eligible_warps_per_cycle, sm_efficiency, and achieved_occupancy. But the measured results are confusing.
Kernel: scatterDevice(float*, edgeBlk_t*, msgBlk_t*)
1 eligible_warps_per_cycle 0.032613
1 sm_efficiency_instance 100.00%
1 sm_efficiency 100.00%
1 achieved_occupancy 0.485068
If I understand correctly, eligible_warps_per_cycle means average number of warps that are eligible to issue per active cycle, and sm_efficiency means the percentage of time at least one warp is active. So, I was wondering why sm_efficiency is 100% but eligible_warps_per_cycle is only 0.03?
Thank you in advance.