Is there any way to get performance counter values without using the Visual Profiler. I have a bunch of different CUDA programs on which I want to look into performance behavior using the performance counters available through the Visual Profiler. I could run each program one by one with the Profiler, but it would be much nicer if I could just run them from a command line and still get profiling results. I could write a keyboard/mouse macro that automates the profiler GUI, but if anybody here know how to get profiling results without the Visual Profiler, please let me know. Thank you!
See /usr/local/cuda/doc/CUDA_Profiler_2.2.txt for a list. Searching this forum for related discussion might help. warp_serialize, divergent_branch, and the coalesced counters are always good starting places.