Could someone please suggest me the most accurate way to time cublasSgemm.
Machine is AMD Opteron & so RDTSC instruction cannot be used.
( AMD Technical Bulletin says that it provides power management mechanisms that independently adjust the performance state (“P-state”) and power state (“C-state”) of the processor; these state changes can affect the rate at which a processor core’s Time Stamp Counter (TSC) is incremented.
Applications should avoid using the TSC directly (through the RDTSC instruction) for
time keeping and instead rely on the appropriate operating system calls.)
When I use gettimeofday function, its taking a 1000*1000 matrix multiplication about 0.001046 second
or 957 Gflops/sec. Unfortunately, one can’t rely on the accuracy of gettimeofday function.