cublas return different result with cpu

Hi, I tested cublas functions and BLAS level 1 functions correctly works. But BLAS Level 2 functions, GEMV’s result is odd.

Without transpose, GEMV return same result with cpu. but with Transpose, result is little different.

For calculating CPU’s GEMV, i make simple loop like below.

for (int i = 0; i < N; i++)


   float cpuResult = 0.0f;

for (int j = 0; j < M; j++)


      cpuResult += alpha * _hostInputMatrixSingle[GetIndex(j, i)] * _hostInputVectorXTSingle[j];


cpuResult = cpuResult + beta * _hostInputVectorYTSingle[i];

Assert.AreEqual(cpuResult, _hostOutputVectorYTSingle[i]);


if matrix’s dimension is low, there is no problem. but if matrix’s dimension is high, there is little difference between cpu and gpu’s result.

What is the problem?

Oh, i found error myself. the result is too large to store in float type. so when i change matrix and vector’s value to small, the error is gone. :)