I have created a set of tools that allows you to automate the profiling of CUDA applications. While the nSight profiler allows to collect all the counters and metrics for an application, it requires manual user intervention and, therefore, is not suitable for batch profiling of applications. The command-line profiler, on the other hand, can be used in scripts but it does not take care of incompatibilities among counters and the user has to define different configurations and schedule many runs, accordingly.
The main tool is a Python script used to collect performance counters. It allows you to specify as many counters as you want and it takes care of grouping them and run the program as many times as necessary. The script outputs a single file per GPU containing all the results. The script can also generate a configuration file for the current machine and a GUI to modify it easily. I plan to expand its functionality and include metrics and further data analysis. I hope you find it useful. Feel free to propose new features or contribute to the project.