Hello,
I’m trying to multiply two matrixes with the cublasSgemm function :
A
0.1 0.01
0.001 0
0 0
At
0.0 0.001 0
0.01 0 0
B
1 2
3 4
5 6
I must execute the product At*B that gives :
AtB
0.103 0.01
0.204 0.02
It works fine with Goto Blas 2 but now I would like to accelerate the processing by using Cublas. I tried to use it, but as it considers the code written in Fortran, here are the matrix it uses (the f is for fortran, the t is for transpose) :
Af
0.1 0
0.01 0
0.001 0
Aft
0.1 0.01 0.001
0 0 0
Bf
1 4
2 5
3 6
AftBf
0.123 0.456
0 0
(AftBf)c
0.123 0
0.456 0
(It took me a long time to find how it found that final matrix )
I work in the optimization field and I don’t want to re-write matrixes ColMajor, how can I do to use Cublas in CUDA so it can multiply matrixes RowMajor ?
Thank you.
EDIT :
I realized that if you change the dimension of matrixes (mn to nm), than you switch the matrix in the product (AtB ==> BAt), it theoratically works in Fortran (the f is for fortran, the t is for transpose) :
A (3x2)
0.1 0.01
0.001 0
0 0
At (3x2)
0.1 0.001 0
0.01 0 0
B (3x2)
1 2
3 4
5 6
At*B
0.103 0.204
0.01 0.02
Bf (2x3)
1 3 5
2 4 6
Af (2x3)
0.1 0.001 0
0.01 0 0
Aft
0.1 0.01
0.001 0
0 0
Bf * Aft
0.103 0.01
0.204 0.02
When that final matrix is read in C, it gives :
0.103 0.204
0.01 0.02
At*B !!!!
I tried this with 3 matrixes :
V (m*r - 3*2 (C)):
0.100000 0.010000 0.001000 0.200000 0.020000 0.002000
M (m*n - 3*2 (C)):
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000
W (n*r - 2*2 (C)):
17.739658 28.783649 14.911300 19.856079
I have to do VtM, WMt, VtV and WWt. I use these four settings for the function calls :
cublasSgemm('n', 't', n, r, m, 1.0, ****, n, cuV, m, 0.0, cuVtM, n);
cublasSgemm('n', 't', r, r, m, 1, cuV, r, cuV, m, 0, cuVtV, r);
cublasSgemm('n', 't', m, r, n, 1.0, cuMt, m, cuWt, n, 0.0, cuWMt, m);
cublasSgemm('n', 't', r, r, n, 1, cuWt, r, cuWt, n, 0, cuWWt, r);
Results are incorrect for the two first products but are okay for the two last :
CuVtM :
1.408013 1.849615 3.104844 3.741813
VtM :
0.203000 0.324000 0.620000 0.832000
CuVtV :
0.013032 0.041283 0.013159 0.005314
VtV :
0.010401 0.001240 0.001240 0.040104
CuWMt :
75.306961 168.353577 261.400177 54.623459 124.158218 193.692963
WMt :
75.306961 168.353577 261.400208 54.623459 124.158218 193.692963
CuWWt :
1143.193970 836.051758 836.051758 616.610718 0.000000 0.000000
WWt :
1143.193970 836.051758 836.051758 616.610718 0.000000 0.000000
I realized that the two last results are correct because W is a square Matrix (else it doesn’t work anymore).
Have someone an idea where I did a mistake and what should I do to have the correct results ?
Thank you