In the description of %warpid and %smid registers in ptx_isa_3.1.pdf it is mentioned that their value “may change during execution, e.g. due to rescheduling of threads following preemption”. Does it mean that GPU threads can be preempted? Such as with their state flushed to memory? When could that happen? (In debugging?)
I suspect preemption must also happen in Dynamic Parallelism in order to allow kernel launches from the device to execute.
CUDA Dynamic Parallelism and the Nsight Visual Studio Edition CUDA debugger in “pre-emption mode” (single GPU debugging mode) can causes to change smid and warpid. The special variables %pm0-7 and %clock64 may also have incorrect values in these two cases.
Great, that explains it. Thanks!