Nsight Visual Studio Edition 3.0 and higher support instruction level counters.
These experiments collect statistics per SASS instruction and roll the data up to PTX and high level source if -lineinfo or -G was specified during compilation.
The Instruction Count experiment collects:
- instructions executed
- thread instructions executed
- not predicated off thread instructions executed
- histogram of thread instructions executed
- histogram of not predicated off thread instructions executed
The Branch Statistics experiment collects:
- branch instructions executed
- branch taken
- divergent branch taken
The Memory Transactions experiment collects data for generic, local, and shared memory instructions:
- memory instructions executed
- l1 transactions
- l1 ideal transactions
- histogram of transactions
The information is displayed in table form and in a side by side source code view.
The source code view supports heat maps and correlation between high level source and SASS.
It is highly recommended that users of this view focus on the SASS information. The compiler applies a lot of optimizations. The user must interpret the high level and PTX source level information in context of the SASS code. For example, the compiler may unroll a loop which is obvious in the correlation map but not necessarily obvious in the high level source statistics.
The contents of the source view can be copy and pasted into a file in order to allow additional processing. @allanmac simple grep of cuobjdump is a nice example of what can be done. This can be modified to also increment by one of the export counters (e.g. inst_executed).