tool/simulator to monitor the instructions executed by each thread ?

Hi All,

Say that, I have multiple control-flow divergences in a kernel, and each thread in a warp may take different path.

is there a way/tool/simulator to monitor the instructions executed by each thread ?

Thanks in advance,
Teguh