I have two questions concerning Matrix-Matrix Multiplication (used the Matmul example or directly the CUBLAS method). The tests were accomplished using randomly generated matrices with dimensions ranging from 16x16 to 4096x4096
When results of either the cublas method or the Matmul method are compared with the computeGold “simple” implementation, an average error of ca. 10e-4 and an error percentage between 40 to 50 %. How can this be explained?
When we take the peak performance of the G80 which is 345 GFlops and 230,4 GFlops respectively into consideration, the measured performance is a bit dissapointing, i.e. 65.5 GFlops (CUBLAS sgemm) or less using the Matmul example 30.2 GFlops. Why is that?