Flop/s measurement

I want to measure the performance of my application on a C870 in terms of Flops/seconds. I cannot figure out how to do it with the CUDA 3.1 Compute Profiler. How can I do it?

Also, does the Compute Profiler show the number of floating-point instructions executed?

Thank you in advance.

You count up the number of FLOPs in your code, and divide by the time taken for your kernel to execute.

You count up the number of FLOPs in your code, and divide by the time taken for your kernel to execute.