I am writing a .Net managed wrapper for CUBLAS in C++/CLI. In my test suite, I usually compare the results of blas operations on the device (using cublas) with results from the Intel MKL.
For sgemm vs cublasSgemm I fill out a matrix with random numbers in (0,1) and multiply it with its transpose.
I get slightly different results from the two calls. The differences get larger as I scale the values of the random numbers. isn’t this weird ?
Even weirder is the fact that if I scale the numbers by a large factor (say 10^6)
the difference in results is always a power of two (512,1024 and 2048 for example). the matrices have average sizes (100 by 10).
has someone had this problem before ?