As we know, when kernel has a global memory access or read-after-write dependency latency,another warp got switched to execute to hide the latency to maximize the throughput.
I was wondering if anyone knows how to measure the time of warp switching.
Any suggestions or comments are welcome.