The minimum sampling interval of PC sampling is 32 cycles. But in the source code details, each line of assembly code has a corresponding warp state. How is this done? Theoretically, HMAA execution only lasts 2 cycles.
Hi, @antonio.msi
Thanks for starting a new topic! Checked your question internal, here are some sharing about PC sampling.
PC sampling selects a random warp every Nth cycle on every SM. For the selected warp we collect the Program Counter (PC) and the Stall Reason. As a kernel usually runs on many SMs and many waves (that means, if the grid is large enough some warps will run sequentially with respect to others), this statistical sampling will eventually get information for every executed SASS instruction. If you would only execute a single warp only in a kernel (or only very few warps), then we might not obtain sampling information for all executed instructions.
You can also refer Become Faster in Writing Performant CUDA Kernels using the Source Page in Nsight Compute | NVIDIA On-Demand for more details about PC sampling data collection. Thanks !
Realy appreciate for your reply !
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.