Hi, I tested cublas functions and BLAS level 1 functions correctly works. But BLAS Level 2 functions, GEMV’s result is odd.
Without transpose, GEMV return same result with cpu. but with Transpose, result is little different.
For calculating CPU’s GEMV, i make simple loop like below.
for (int i = 0; i < N; i++)
{
float cpuResult = 0.0f;
for (int j = 0; j < M; j++)
{
cpuResult += alpha * _hostInputMatrixSingle[GetIndex(j, i)] * _hostInputVectorXTSingle[j];
}
cpuResult = cpuResult + beta * _hostInputVectorYTSingle[i];
Assert.AreEqual(cpuResult, _hostOutputVectorYTSingle[i]);
}
if matrix’s dimension is low, there is no problem. but if matrix’s dimension is high, there is little difference between cpu and gpu’s result.
What is the problem?