I want to measure the performance of my application on a C870 in terms of Flops/seconds. I cannot figure out how to do it with the CUDA 3.1 Compute Profiler. How can I do it?
Also, does the Compute Profiler show the number of floating-point instructions executed?