Hi,
I am using the Tesla C870 hardware. Where can I find performance information such as how many cycles an SP takes to finish a multiplication operation?
Thank you!
Hi,
I am using the Tesla C870 hardware. Where can I find performance information such as how many cycles an SP takes to finish a multiplication operation?
Thank you!
pipeline latency of multiplication is 24 cycle, please see Volkov’s paper,
Vasily Volkov, James W. Demmel, Benchmarking GPUs to Tune Dense Linear Algebra. In SC ’08: Preceedings of the 2008 ACM/IEEE conference on Supercomputing. Piscataway, NJ, USA, 2008, IEEE Press.
you can download it via thread http://forums.nvidia.com/index.php?showtopic=89084