Is there any way to get performance counter values without using the Visual Profiler. I have a bunch of different CUDA programs on which I want to look into performance behavior using the performance counters available through the Visual Profiler. I could run each program one by one with the Profiler, but it would be much nicer if I could just run them from a command line and still get profiling results. I could write a keyboard/mouse macro that automates the profiler GUI, but if anybody here know how to get profiling results without the Visual Profiler, please let me know. Thank you!
This is exactly what I want to do. Thank you! Is there any documentation on the environment variables? I looked at the Visual Profiler release notes and user manual, but nothing helps.
See /usr/local/cuda/doc/CUDA_Profiler_2.2.txt for a list. Searching this forum for related discussion might help. warp_serialize, divergent_branch, and the coalesced counters are always good starting places.