Incorrect call to smbv

I’m unable to properly call smbv BLAS function to receive the required result.
This is my example:
sbmv is y = alpha*A*x + beta*y
A is a 4x4 band matrix in the general band format { 0, 8, 9, 5, 1, 2, 3, 4 };, where 1, 2, 3, 4 is a diagonal and 8, 9, 5 is the 1st superdiagonal (and sub one too). X and Y are vectors of {1, 1, 1, 1}. With alpha = 2 and b = 1, the resulting vector is supposed to be {19. 39. 35. 19.}

However, the cublasSsbmv result is { 35.00 31.00 13.00 15.00 }, which is not what I need. Could you please help me make the right call?

The code snippet (the full program attached):

#define K 1 /*band size*/
#define M 2 /*cols*/
#define N 4 /*rows*/

int sbmv() {
// ......
    float a[M * N] = { 0, 8, 9, 5, 1, 2, 3, 4 };
    float x[N] = { 1, 1, 1, 1 };
    float y[N] = { 1, 1, 1, 1 };
    float alpha = 2.0, beta = 1.0;

    for (int j = 0; j < N; j++) {
        for (int i = 0; i < M; i++) printf("%7.0f", a[IDX2C(j, i, N)]);
        printf("\n");
    }
    printf("----\n");

    for (int j = 0; j < N; j++) printf("%7.0f", y[j]);
    printf("\n---------------------------------\n");

    float* devPtrA, * devPtrX, * devPtrY;
    assert(cublasCreate(&handle) == CUBLAS_STATUS_SUCCESS);
    assert(cudaMalloc((void**)&devPtrA, M * N * sizeof(*a)) == cudaSuccess);
    assert(cudaMalloc((void**)&devPtrX, N * sizeof(*a)) == cudaSuccess);
    assert(cudaMalloc((void**)&devPtrY, N * sizeof(*a)) == cudaSuccess);
    assert(cublasSetMatrix(M, N, sizeof(*a), a, M, devPtrA, M) == CUBLAS_STATUS_SUCCESS);
    assert(cublasSetVector(N, sizeof(*a), x, 1, devPtrX, 1) == CUBLAS_STATUS_SUCCESS);
    assert(cublasSetVector(N, sizeof(*a), y, 1, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

    // SBMV call
    assert(cublasSsbmv(handle, CUBLAS_FILL_MODE_UPPER, N, K,
        &alpha, devPtrA, M, devPtrX, 1, &beta, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// ...

    for (int j = 0; j < N; j++) printf("%7.0f ", y[j]);
    printf("\n----\n");

    return EXIT_SUCCESS;
}

sbmv_call.c (2 KB)

Your banded representation of A is not correct. You may wish to refer to the documentation, which states:

If uplo == CUBLAS_FILL_MODE_UPPER then the symmetric banded matrix A is stored column by column, with the main diagonal of the matrix stored in row k+1, the first superdiagonal in row k (starting at second position), the second superdiagonal in row k-1 (starting at third position), etc. So that in general, the element A(i,j)A(i,j) is stored in the memory location A(1+k+i-j,j) for j=1,…,nj=1,…,n and i∈[max(1,j−k),j]i∈[max(1,j−k),j] .

Your A-matrix has 1 subdiagonal and superdiagonal, so k = 1. (note 1-based indexing in this description.) Therefore we start with a matrix organized as you have shown:

A =      0, 8, 9, 5
         1, 2, 3, 4

and then we store that column-by-column, because cublas is column-major. So if we convert that A representation to a vector, column by column, we have

A = 0,1,8,2,9,3,5,4

when I use that ordering, I get your expected output:

$ cat t6.cu
#define K 1 /*band size*/
#define M 2 /*cols*/
#define N 4 /*rows*/
#include <cublas_v2.h>
#include <cassert>
#include <cstdio>
int main() {
// ......
    //float a[M * N] = { 0, 8, 9, 5, 1, 2, 3, 4 };
    float a[M * N] = { 0, 1, 8, 2, 9, 3, 5, 4 };
    float x[N] = { 1, 1, 1, 1 };
    float y[N] = { 1, 1, 1, 1 };
    float alpha = 2.0, beta = 1.0;
    cublasHandle_t handle;
    cublasCreate(&handle);

    for (int j = 0; j < N; j++) printf("%7.0f", y[j]);
    printf("\n---------------------------------\n");

    float* devPtrA, * devPtrX, * devPtrY;
    assert(cublasCreate(&handle) == CUBLAS_STATUS_SUCCESS);
    assert(cudaMalloc((void**)&devPtrA, M * N * sizeof(*a)) == cudaSuccess);
    assert(cudaMalloc((void**)&devPtrX, N * sizeof(*a)) == cudaSuccess);
    assert(cudaMalloc((void**)&devPtrY, N * sizeof(*a)) == cudaSuccess);
    assert(cublasSetMatrix(M, N, sizeof(*a), a, M, devPtrA, M) == CUBLAS_STATUS_SUCCESS);
    assert(cublasSetVector(N, sizeof(*a), x, 1, devPtrX, 1) == CUBLAS_STATUS_SUCCESS);
    assert(cublasSetVector(N, sizeof(*a), y, 1, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

    // SBMV call
    assert(cublasSsbmv(handle, CUBLAS_FILL_MODE_UPPER, N, K,
        &alpha, devPtrA, M, devPtrX, 1, &beta, devPtrY, 1) == CUBLAS_STATUS_SUCCESS);

// ...
    cudaMemcpy(y, devPtrY, N*sizeof(*a), cudaMemcpyDeviceToHost);
    for (int j = 0; j < N; j++) printf("%7.0f ", y[j]);
    printf("\n----\n");

    return EXIT_SUCCESS;
}
$ nvcc -o t6 t6.cu -lcublas
$ cuda-memcheck ./t6
========= CUDA-MEMCHECK
      1      1      1      1
---------------------------------
     19      39      35      19
----
========= ERROR SUMMARY: 0 errors
$
1 Like

I obviously have read the documentation, and it says

The fortran storage format is quite misleading. Now it multiplies correctly
and I can move further. Thank you for the quick response.

Yes, I find having to switch between row-major thinking and column-major thinking very confusing.

You did correctly put the main diagonal in row k+1, and I pointed out that your organization of the A matrix was correct. However that A matrix, represented that way, when stored in column-major order will take on the pattern I indicated, looking through memory in a linear fashion. It’s exactly the same order in linear memory as you would get if you stored that A matrix 2-dimensionally, column-major.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.