Hello, I’m using jcuda and jcublas (version 9.2) for matrix operation.

(Also my cuda version is 9.2)

Jama library uses cpu,

Jcublas library uses gpu

Theoretically, those matrix multiplication results are same,

but, They are different.

Do you know reason why those are different,

And how to fix(making same results) it?

Thank you,

here is test code (I used 3x3 matrix)

```
import Jama.Matrix;
import jcuda.*;
import jcuda.jcublas.*;
public class MatrixMultiplicationTest {
public static void main(String[] args) {
double alpha = 1.0;
double beta = 0.0;
double[][] dummy = { { 9, 8, 7 },
{ 6, 5, 4 },
{ 3, 2, 1 } };
// Unit Matrix
double[][] e = { { 1, 0, 0 },
{ 0, 1, 0 },
{ 0, 0, 1 } };
int row = dummy.length;
int col = dummy[0].length;
double h_A[] = new double[row * col];
// Column major
h_A[0] = 1;
h_A[1] = 0;
h_A[2] = 0;
h_A[3] = 0;
h_A[4] = 1;
h_A[5] = 0;
h_A[6] = 0;
h_A[7] = 0;
h_A[8] = 1;
double h_C[] = new double[row * col];
// Column major
h_C[0] = 0;
h_C[1] = 0;
h_C[2] = 0;
h_C[3] = 0;
h_C[4] = 0;
h_C[5] = 0;
h_C[6] = 0;
h_C[7] = 0;
h_C[8] = 0;
/* Initialize JCublas */
JCublas.cublasInit();
double h_M[] = new double[row * col];
// Column major
h_M[0] = 9;
h_M[1] = 6;
h_M[2] = 3;
h_M[3] = 8;
h_M[4] = 5;
h_M[5] = 2;
h_M[6] = 7;
h_M[7] = 4;
h_M[8] = 1;
JCublas.cublasInit();
/* Allocate host memory for the matrices */
Pointer d_A = new Pointer();
Pointer d_M = new Pointer();
Pointer d_C = new Pointer();
/* Allocate device memory for the matrices */
JCublas.cublasAlloc(row * col, Sizeof.DOUBLE, d_M);
JCublas.cublasAlloc(row * col, Sizeof.DOUBLE, d_C);
JCublas.cublasAlloc(row * col, Sizeof.DOUBLE, d_A);
/* Initialize the device matrices with the host matrices */
JCublas.cublasSetVector(row * col, Sizeof.DOUBLE, Pointer.to(h_M), 1, d_M, 1);
JCublas.cublasSetVector(row * col, Sizeof.DOUBLE, Pointer.to(h_C), 1, d_C, 1);
JCublas.cublasSetVector(row * col, Sizeof.DOUBLE, Pointer.to(h_A), 1, d_A, 1);
Matrix matrixM = new Matrix(dummy);
Matrix matrixC = new Matrix(e);
int count = 0;
JCublas.printMatrix(3, d_C, 3);
while (count < 34) {
JCublas.cublasDgemm('n', 'n', 3, 3, 3, alpha, d_A, 3, d_M, 3, beta, d_C, 3);
JCublas.cublasDcopy(row*col, d_C, 1, d_A, 1);
matrixC = matrixC.times(matrixM);
count++;
}
/* Read the result back */
JCublas.cublasGetVector(row * col, Sizeof.DOUBLE, d_C, 1, Pointer.to(h_C), 1);
double[] x = matrixC.getColumnPackedCopy();
System.out.println(x[0] - h_C[0] + " ");
System.out.println(x[3] - h_C[3] + " ");
System.out.println(x[6] - h_C[6] + " ");
System.out.println(x[1] - h_C[1] + " ");
System.out.println(x[4] - h_C[4] + " ");
System.out.println(x[7] - h_C[7] + " ");
System.out.println(x[2] - h_C[2] + " ");
System.out.println(x[5] - h_C[5] + " ");
System.out.println(x[8] - h_C[8] + " ");
/* Memory clean up */
JCublas.cublasFree(d_M);
JCublas.cublasFree(d_C);
/* Shutdown */
JCublas.cublasShutdown();
}
}
```

JAMA : JAMA: Java Matrix Package

I refered to : Floating Point and IEEE 754 :: CUDA Toolkit Documentation