Hi!
When I read introduction of NCU sm__ctas_launched_total, I find a term “preemption-restore events” that I don’t know what exactly it is. Can you help me to explain what it is?
The compute preemption feature provides for a way to avoid long running kernels from monopolizing the GPU, at the risk of context switch overheads associated with compute preemption. The preemption-restore events referred in the total metric correspond to these overheads. You can also find more information on this feature here.
If I would like to use an example to distinguish NCU sm__ctas_launched_total and sm__ctas_launched, what kind of demo is recommended? I suppose that a simple kernel could not show the difference.
hi,felix_dt! These days I was wondering what situation can be seen as preemption events but I still can not find a detailed situation. Can you give me a more detailed sample to understand these three metrics?
Nsight Compute increases the compute preemption timeout to multiple seconds for profiling to reduce the number of times that a profiled kernel is preempted in order to make the per-kernel results more precise. To still see preemption events, you could use two applications, one with a (infinitely) long running kernel and a second, profiled one with a kernel running at least ~5 seconds. You should then see sm__ctas_launched_total.sum to be potentially bigger than sm__ctas_launched.sum
$ infiniteKernel &
$ ncu --metrics sm__ctas_launched.sum,sm__ctas_launched_total.sum,gpu__time_duration.sum ./waitKernel
[...]
---------------------------------------------------------------------- -----
gpu__time_duration.sum second 4.39
sm__ctas_launched.sum block 1
sm__ctas_launched_total.sum block 2
---------------------------------------------------------------------- -----