Hi CUDA guys,
I have a number of CUDA projects that are automatically generated, contain no timing code, and I should get all kinds of benchmark numbers out of it. And get it fast.
How can I get to the numbers (runtime, kernel launch configuration, detailed resource usage) in the most efficient way? I’m on Ubuntu 10.04LTS, CUDA 3.20, Tesla C2050 & GTX8800.