sac
December 15, 2010, 10:28am
1
Hi All,
What is the formula for computing GFLOPS for GEMM ? I have used following formulas please
give your feedback.
DGEMM and SGEMM = (2M NK) (timeInSec)/ (1024^3) // factor 2 : 1 mult + 1 addition
CGEMM and ZGEMM = (6 MN K) (timeInSec)/ (1024^3) // factor 6 : 4 (complex mult) + 2 (complex addition)
Regards,
Sachin
avidday
December 15, 2010, 11:38am
2
The standard BLAS gemm operation is
C <- alpha * AB + beta*C
so off the top of my head, the total flop count for the scalar version should be M(2NK) + MN + 2MN = MN(2K+3). This means the throughput of the operation should be computed as
MN(2K+3) / (1000^3 * time in second)
for the result to be in Giga flop per second. So by my reckoning, your formulas are incorrect in several places.
EDITED for mixing up K and N in the orginal post.
1 Like
Lev
December 15, 2010, 1:58pm
3
The standard BLAS gemm operation is
C <- alpha * AB + beta*C
so off the top of my head, the total flop count for the scalar version should be M(2NK) + MK + 2MK = MK(2N+3). This means the throughput of the operation should be computed as
MK(2N+3) / (1000^3 * time in second)
for the result to be in Giga flop per second. So by my reckoning, your formulas are incorrect in several places.
(a+bi)(c+di)=a c-bd+(a d+b*d)i
avidday
December 15, 2010, 2:06pm
4
Your point being? I posted what I believe to be the correct operations count for the scalar sgemm/dgemm case. Is there an error?
Lev
December 15, 2010, 11:16pm
5
I am sorry, I wanted to reply to thread author.
sac
December 16, 2010, 5:56am
6
Let do detail analysis for both scalar and complex inputs.
Count Total Operations
C ← alpha * AB + beta*C
Operations(AB) = MNK (mult) + MN(K-1) (add)
Operation(alpha*AB) = MNK (mult) + MN(K-1) (add) + MN (mult)
Operation(alphaAB + beta C) = MNK (mult) + MN(K-1) (add) + MN (mult) + MN (mult) + MN (add)
Total = MN(K+2) (mult) + MNK (add)
GFLOPS for DGEMM and SGEMM
Total Operations = MN(2K+2) GLOPS = (MN(2K+2) / (1000^3 * (timeInSec))
GFLOPS for CGEMM and ZGEMM
Total Operations = MN(K+2)6 + MNK 2 = 8MNK + 18MN // 6 (complex mult) + 2 (complex addition)
GFLOPS = (8MNK + 12MN) / (1000^3 * (timeInSec))
Waiting for your feedback.
2 Likes