I’m now using CUBLAS 1.0’s cublasSgemm for measuring numerical calculation performance of my GPU’s.
However, the results of cublasSgemm are difference from results of atlas.
As a result of my practice, I noticed that correct results can be obtained by swapping matrixA for matrixB.
Is this a cublas’s bug?
or am I using wrong manner?