How can I get each CUDA thread's dynamic instruction sequence and num?

How can I get each CUDA thread’s dynamic instruction sequence and num?
I have tried to search for the proper metrics in Nsight Compute but failed.

There is no metric that measures the sequence of executed instructions. You can see the code itself on the UI’s Source page.

You can get the number of instructions in various forms (per opcode on the Details page, or per address on the Source page) e.g. by collecting --set full. You may also want to check the available *inst_executed* metrics, as listed in the Metrics Reference under Source Metrics. They are included in the Source Counters section (and thereby also in the full set), but can be collected standalone, too. If you want to see the values on the command line, you will need to set the output to the source page.

Thanks for your guidance.
Now I attempt to use CUDA Binary Utilities :: CUDA Toolkit Documentation to generate the basic block control flow graph to get the control flow (as well as the instruction sequence) of each thread, but unfortunately, this is not in thread granularity. So I have no idea how to tackle this. Would you please provide me with some suggestions or tools for grouping threads by their control flow(instruction sequence or nums)? Thanks a lot!

I will check with the team, but I am not aware of any solution for this. This seems like it would incur extremely high overhead to collect this sort of information at runtime, but the static analysis would not be sufficient for it. You have to take into account that there can be millions of individual threads, all of which can technically execute independently.

Is there a specific optimization you are trying to achieve with this data, maybe other metrics can be sufficient, too?

Thanks for your reply.
Maybe I need to collect the representative instruction sequences among all these threads, that is to say, to group these threads and get the ‘instruction sequence-the num of threads following this sequence’ pair, regardless of detailed information (because of the huge trace space).