Is there any tools that can collect the instruction information of one cuda program?

I want to collect the instruction information of one cuda program, such as, how many ‘ld’ the program has, how many instructions the program has in total, etc.

I know maybe I could calculate those by reading .ptx file after compiling with ‘-keep’. But I still want to know if there is any tool that I can use to do this job…

Thank you all!!

Maybe use NVidia’s NSight or NVIDIA Visual Profiler. I think both will do this and both are free.

Note you are probably not interested in analysis on the PTX level, which merely is an intermediate representation, but in analysis of the output of “cuobjdump -sass”. The latter shows you the shader assembly code that is actually executed on the device.

I don’t know of particular tools for it’s analysis, but quite a bit can be achieved using standard unix tools like grep and wc -l.

As @tera says grep and other unix tools can do most of what you want.

A useful grep from another thread in this forum is:

This will produce a list of instructions and their counts. This was tested against CUDA 5.5’s cuobjdump.exe.

Also, the CUDA 5.5 NSight 3.1 Performance Analysis function seems to now be producing instruction count info. Look under the “CUDA Source Profiler” node in the results tree.

I’ve only tried it out once but it reveals where branches and load/stores occur in the source code along with line-by-line statistics. It’s looking very useful.

Nsight Visual Studio Edition 3.0 and higher support instruction level counters.

These experiments collect statistics per SASS instruction and roll the data up to PTX and high level source if -lineinfo or -G was specified during compilation.

The Instruction Count experiment collects:

  • instructions executed
  • thread instructions executed
  • not predicated off thread instructions executed
  • histogram of thread instructions executed
  • histogram of not predicated off thread instructions executed

The Branch Statistics experiment collects:

  • branch instructions executed
  • branch taken
  • divergent branch taken

The Memory Transactions experiment collects data for generic, local, and shared memory instructions:

  • memory instructions executed
  • l1 transactions
  • l1 ideal transactions
  • histogram of transactions

The information is displayed in table form and in a side by side source code view.

The source code view supports heat maps and correlation between high level source and SASS.

It is highly recommended that users of this view focus on the SASS information. The compiler applies a lot of optimizations. The user must interpret the high level and PTX source level information in context of the SASS code. For example, the compiler may unroll a loop which is obvious in the correlation map but not necessarily obvious in the high level source statistics.

The contents of the source view can be copy and pasted into a file in order to allow additional processing. @allanmac simple grep of cuobjdump is a nice example of what can be done. This can be modified to also increment by one of the export counters (e.g. inst_executed).

@Greg @ NV

Can Nsight eclipse editon do this job?