Hardware Performance Data performance

Hi,

I am using the Tesla C870 hardware. Where can I find performance information such as how many cycles an SP takes to finish a multiplication operation?

Thank you!

pipeline latency of multiplication is 24 cycle, please see Volkov’s paper,

Vasily Volkov, James W. Demmel, Benchmarking GPUs to Tune Dense Linear Algebra. In SC ’08: Preceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA, 2008, IEEE Press.

you can download it via thread http://forums.nvidia.com/index.php?showtopic=89084