CUBLAS matrix multiplication

ChelovekKorzhik · December 14, 2012, 4:01pm

Hello. Sorry for my English, it’s not my native language. Please, help me to fix my code. I try to multiply 3 matrices: A(m x n), B(n x k) and D(k x l)

This code work sometimes, but sometimes it’s not.
For example, with
"
const int m=100;
const int n=101;
const int k=102;
const int l=103;
"
I get
"
(:
Device: nan; Host: 28234250977280.000000
):
"

The first multiplication is correct (smile) and the second - not.

With
"
const int m=110;
const int n=39;
const int k=112;
const int l=132
"

I get

"
** On entry to SGEMM parameter number 10 had an illegal value
Multiplication failed. (1)
"

Multiplication does not start.

With
"
const int m=11;
const int n=11;
const int k=11;
const int l=11;
"
I get right answer, but I want not only to multiply square matrices (and it does not work for all nubmers).

Here is code: [url]http://pastebin.com/Jz3gubV1[/url]. I can’t paste it in this message (can’t create topic)

Sorry, now I do not use the “code” tag, because it works badly (maybe only in my firefox 16?)

njuffa · December 14, 2012, 4:38pm

Keep in mind that the storage convention use by CUBLAS for two-dimensional matrices is column-major ordering (the elements of a column occupy consecutive storage locations). This is the ordering used by Fortran and Matlab, for example. C and C++ use row-major ordering. Consequently, the “leading dimension” arguments (LDA, LDB, LDC) passed to *GEMM would be equal to the number of rows in each matrix for your examples.

We are aware of issues with the “code” tag in the new forums, sorry for the problems with that. I have raised the issue internally before, will do so again. What issues did you see specifically? The one I have encountered is that line-extension backslashes in multiline macros get eliminated. Also, one cannot simply cut & paste from a “code” section because the line numbers have been added.

ChelovekKorzhik · December 14, 2012, 5:05pm

Sorry, I did not understand exactly. I read, that there is column-major ordering ( here http://docs.nvidia.com/cuda/cublas/index.html#topic_7_2 ). And it says, that lda is number of the rows in matrix.I thought that we have in mind is column-major ordering matrix, and so, I have matrices A(m x n) and B(n x k). When I put it in cublasSgemm, i must think, that I multiply B(k x n) and A (n, m) (i must change the order). So, we have

cublasSgemm(handle,
			CUBLAS_OP_N, CUBLAS_OP_N,
			k, n, m,
			scal,      // alpha (1)
			dev_B, k,  // ?
			dev_A, n,  // ?
			(scal+1),  // beta (0)
			dev_C, k); // ?

and so, we have C(k x m), matrices are column-oriented. When we get it, it is C(m x k).

When I tried to apply 4 code tags, I get one messages in every tag. And I could not place program code (because topic not created). May be there is character limit?

njuffa · December 14, 2012, 6:50pm

The storage layout conventions do not change the mathematical dimensions of the matrix. In this case (A and B are not transposed):

GEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)

A(m x k) → LDA is m
B(k x n) → LDA is k
C(m x n) → LDA is m

ChelovekKorzhik · December 14, 2012, 7:49pm

Sorry, I can’t understand it. For example,
m=2, n=3, k=2;
A={1,2,3,4,5,6};
B={7,8,9,1,2,3,};
For row-oriented we have
1 2 3
4 5 6

and

7 8
9 1
2 3

But for column-oriented we have
1 3 5
2 4 6

and

7 1
8 2
9 3

There is such code in SDK sample (matrixMul.cu)

//some NVIDIA's code

//note cublas is column primary!
            //need to transpose the order
            cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, uiWB, uiHA, uiWA, &alpha, d_B, uiWB, d_A, uiWA, &beta, d_C, uiWA);

//some NVIDIA's code

 //Performs warmup operation using matrixMul CUDA kernel
		if (block_size == 16) {
            matrixMul>(d_C, d_A, d_B, uiWA, uiWB);

//some NVIDIA's code

So, we know, that (A x B)t (transposed) = (B)t x (A)t, and we know, that reading row-oriented matrices in column-oriented way is just matrix transposition. So, my code looks correct. Please, correct me if i’m wrong.

Looks like parser cuts some code. I mean
matrixMul<16><<< grid, threads >>>(d_C, d_A, d_B, uiWA, uiWB);

ChelovekKorzhik · December 15, 2012, 6:20pm

I’m so sorry. I had to read the instructions carefully. Here is code:

//memory allocation and so on
...
//first multiplication
 stat = cublasSgemm(handle,
			CUBLAS_OP_N, CUBLAS_OP_N,
			k, m, n,
			scal,      // alpha (1)
			dev_B, k,  // ?
			dev_A, n,  // ?
			(scal+1),  // beta (0)
			dev_C, k); // ?
...
 //The second multiplication 
     stat = cublasSgemm(handle,
			CUBLAS_OP_N, CUBLAS_OP_N,
			l, m, k,
     			scal,      // alpha (1)
			dev_D, l,  // ?
			dev_C, k,  // ?
			(scal+1),  // beta (0)
			dev_E, l); // ?

...

I try it for
"
const int m=110;
const int n=39;
const int k=112;
const int l=132
"
and some other parameters, and it works. The computations with large matrices have errors (CPU and GPU results differ), but it works! Thank you!

Topic		Replies	Views
CUBLAS, simple product of matrix CUDA Programming and Performance	3	2233	July 17, 2008
CUBLAS issues Some simple question about CUBLAS CUDA Programming and Performance	1	1270	August 22, 2011
beginner CUBLAS Sgemm question CUDA Programming and Performance	2	1691	March 9, 2010
Matrix multiplication with CublasSgemm CUDA Programming and Performance	3	8754	January 13, 2013
A newbie question on cublasSgemm CUDA Programming and Performance	6	4915	May 14, 2008
CUBLAS problem CUDA Programming and Performance	32	19387	March 28, 2012
cublas - cublasSgemm - problem CUDA Programming and Performance	2	2127	March 16, 2010
cublasSgemm wrong return value CUDA Programming and Performance	15	6269	November 12, 2010
Question in using Sgemm CUDA Programming and Performance	2	779	May 18, 2011
cublasSgemm error? CUDA Programming and Performance	2	5279	July 12, 2007

CUBLAS matrix multiplication

Related topics