# Incorrect call to smbv

I’m unable to properly call `smbv` BLAS function to receive the required result.
This is my example:
`sbmv` is `y = alpha*A*x + beta*y`
A is a 4x4 band matrix in the general band format `{ 0, 8, 9, 5, 1, 2, 3, 4 };`, where `1, 2, 3, 4` is a diagonal and `8, 9, 5` is the 1st superdiagonal (and sub one too). X and Y are vectors of {1, 1, 1, 1}. With `alpha = 2` and `b = 1`, the resulting vector is supposed to be `{19. 39. 35. 19.}`

However, the `cublasSsbmv` result is `{ 35.00 31.00 13.00 15.00 }`, which is not what I need. Could you please help me make the right call?

The code snippet (the full program attached):

``````#define K 1 /*band size*/
#define M 2 /*cols*/
#define N 4 /*rows*/

int sbmv() {
// ......
float a[M * N] = { 0, 8, 9, 5, 1, 2, 3, 4 };
float x[N] = { 1, 1, 1, 1 };
float y[N] = { 1, 1, 1, 1 };
float alpha = 2.0, beta = 1.0;

for (int j = 0; j < N; j++) {
for (int i = 0; i < M; i++) printf("%7.0f", a[IDX2C(j, i, N)]);
printf("\n");
}
printf("----\n");

for (int j = 0; j < N; j++) printf("%7.0f", y[j]);
printf("\n---------------------------------\n");

float* devPtrA, * devPtrX, * devPtrY;
assert(cublasCreate(&handle) == CUBLAS_STATUS_SUCCESS);
assert(cudaMalloc((void**)&devPtrA, M * N * sizeof(*a)) == cudaSuccess);
assert(cudaMalloc((void**)&devPtrX, N * sizeof(*a)) == cudaSuccess);
assert(cudaMalloc((void**)&devPtrY, N * sizeof(*a)) == cudaSuccess);
assert(cublasSetMatrix(M, N, sizeof(*a), a, M, devPtrA, M) == CUBLAS_STATUS_SUCCESS);
assert(cublasSetVector(N, sizeof(*a), x, 1, devPtrX, 1) == CUBLAS_STATUS_SUCCESS);
assert(cublasSetVector(N, sizeof(*a), y, 1, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// SBMV call
assert(cublasSsbmv(handle, CUBLAS_FILL_MODE_UPPER, N, K,
&alpha, devPtrA, M, devPtrX, 1, &beta, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// ...

for (int j = 0; j < N; j++) printf("%7.0f ", y[j]);
printf("\n----\n");

return EXIT_SUCCESS;
}
``````

sbmv_call.c (2 KB)

Your banded representation of `A` is not correct. You may wish to refer to the documentation, which states:

If uplo == CUBLAS_FILL_MODE_UPPER then the symmetric banded matrix A is stored column by column, with the main diagonal of the matrix stored in row k+1, the first superdiagonal in row k (starting at second position), the second superdiagonal in row k-1 (starting at third position), etc. So that in general, the element A(i,j)A(i,j) is stored in the memory location A(1+k+i-j,j) for j=1,…,nj=1,…,n and i∈[max(1,j−k),j]i∈[max(1,j−k),j] .

Your A-matrix has 1 subdiagonal and superdiagonal, so k = 1. (note 1-based indexing in this description.) Therefore we start with a matrix organized as you have shown:

``````A =      0, 8, 9, 5
1, 2, 3, 4
``````

and then we store that column-by-column, because cublas is column-major. So if we convert that A representation to a vector, column by column, we have

``````A = 0,1,8,2,9,3,5,4
``````

when I use that ordering, I get your expected output:

``````\$ cat t6.cu
#define K 1 /*band size*/
#define M 2 /*cols*/
#define N 4 /*rows*/
#include <cublas_v2.h>
#include <cassert>
#include <cstdio>
int main() {
// ......
//float a[M * N] = { 0, 8, 9, 5, 1, 2, 3, 4 };
float a[M * N] = { 0, 1, 8, 2, 9, 3, 5, 4 };
float x[N] = { 1, 1, 1, 1 };
float y[N] = { 1, 1, 1, 1 };
float alpha = 2.0, beta = 1.0;
cublasHandle_t handle;
cublasCreate(&handle);

for (int j = 0; j < N; j++) printf("%7.0f", y[j]);
printf("\n---------------------------------\n");

float* devPtrA, * devPtrX, * devPtrY;
assert(cublasCreate(&handle) == CUBLAS_STATUS_SUCCESS);
assert(cudaMalloc((void**)&devPtrA, M * N * sizeof(*a)) == cudaSuccess);
assert(cudaMalloc((void**)&devPtrX, N * sizeof(*a)) == cudaSuccess);
assert(cudaMalloc((void**)&devPtrY, N * sizeof(*a)) == cudaSuccess);
assert(cublasSetMatrix(M, N, sizeof(*a), a, M, devPtrA, M) == CUBLAS_STATUS_SUCCESS);
assert(cublasSetVector(N, sizeof(*a), x, 1, devPtrX, 1) == CUBLAS_STATUS_SUCCESS);
assert(cublasSetVector(N, sizeof(*a), y, 1, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// SBMV call
assert(cublasSsbmv(handle, CUBLAS_FILL_MODE_UPPER, N, K,
&alpha, devPtrA, M, devPtrX, 1, &beta, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// ...
cudaMemcpy(y, devPtrY, N*sizeof(*a), cudaMemcpyDeviceToHost);
for (int j = 0; j < N; j++) printf("%7.0f ", y[j]);
printf("\n----\n");

return EXIT_SUCCESS;
}
\$ nvcc -o t6 t6.cu -lcublas
\$ cuda-memcheck ./t6
========= CUDA-MEMCHECK
1      1      1      1
---------------------------------
19      39      35      19
----
========= ERROR SUMMARY: 0 errors
\$
``````
1 Like

I obviously have read the documentation, and it says

The fortran storage format is quite misleading. Now it multiplies correctly
and I can move further. Thank you for the quick response.

Yes, I find having to switch between row-major thinking and column-major thinking very confusing.

You did correctly put the main diagonal in row k+1, and I pointed out that your organization of the A matrix was correct. However that A matrix, represented that way, when stored in column-major order will take on the pattern I indicated, looking through memory in a linear fashion. It’s exactly the same order in linear memory as you would get if you stored that A matrix 2-dimensionally, column-major.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.