Automating benchmarking of CUDA projects

Hi CUDA guys,

I have a number of CUDA projects that are automatically generated, contain no timing code, and I should get all kinds of benchmark numbers out of it. And get it fast.

How can I get to the numbers (runtime, kernel launch configuration, detailed resource usage) in the most efficient way? I’m on Ubuntu 10.04LTS, CUDA 3.20, Tesla C2050 & GTX8800.

Thanks, Ana

Looks the new compute profiler + environment variables is just what I need. Consider the question solved :)