Cublas matrix memory layout Problem with cublasSgbmv function


We want to do a Matrix Vector multiplication of a bandmatrix, using cublasSgbmv from the Cublas library. The results we get are not as expected, so we’re probably doing something wrong.

It’s not completely clear to us from the documentation what the expected memory layout of the input matrix A is.

First question: Is the expected memory layout of matrix A, row major or column major. Since this is a C library, we would expect row major, however, since the library is based on a Fortran library, it could also be column major.

Second, the description says that the matrix should be supplied column by column.

As we understand it, the rows of the marix have to contain the diagonals and the columns should contain the elements from each diagonal.

For instance, we have a matrix with 3 super diagonals and 3 sub diagonals. The input matrix then looks as follows:

                 col 0, 1, 2, 3, 4, 5, n - 1

row 0, u3

row 1, u2

row 2, u1

row 3, d

row 4, l1

row 5, l2

row 6, l3

Is this correct?

And finally a question about the parameter m, which has to contain the number of rows of matrix A. In this case, the number of rows equals the number of bands + 1, so 7. Or should this be the number of rows of the original full matrix?

Thanks in advance,


As spelled out in the CUBLAS documentation, CUBLAS uses column-major
storage for compatibility with Fortran and Matlab. For banded matrices
it follows established BLAS/LAPACK conventions, as used by the reference
implementations on netlib. For the band matrix storage convention, see

Sorry for the late response.

Thanks for the answer. I focused too much on the documentation of cublasSgbmv itself and missed the information about column-/row-major storage in chapter 1. :">

One more thing that’s not clear to me is the value of parameter ‘m’ that we need to pass to cublasSgbmv. For our example (from a real world scenario) we need to compute a band matrix of 1507898 x 1507898 elements.

According to the documentation parameter ‘m’ should be the number of rows of matrix A. Should this be the number of rows of the original matrix (i.e. 1507898) or the number of rows of the matrix that’s actually passed (i.e. kl + ku + 1?).


‘m’ is the number of rows in the uncompressed band matrix.
The number of rows in the compressed band matrix is kl+ku+1