The execution stall reason described in [1] is not complete as I see two more metrics in compute compatibility 5. They are
Warp not selected (stall_not_selected)
Miscellaneous (stall_other)
The description in [2] is not very meaningful.
1- In what circumstances a warp is not selected? For example, if is waiting for a data from memory (load/store), then stall_memory_dependency answers that. Or if is waiting for an instruction fetch, then stall_inst_fetch answers that.
2- What does “other” mean exactly? Any example for that?
[1] https://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/issueefficiency.htm
[2] https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference-5x