Proper use of cusparseScsrmv - x and y sizes?

Hello ,

I want to solve the Ax=B system.
And I want to use cusparseScsrmv function.
I haven’t quite understand how to use it though.

I checked here

I tried to do in my code:

int nz = 0 , *I = NULL, *J = NULL;
    float *val = NULL;
    int *d_col, *d_row;
    float *d_val, *d_x,*d_Ax;
    float alpha, betaA;
    I = (int *)malloc(sizeof(int)*(N+1));
    J = (int *)malloc(sizeof(int)*nz);
     /* Get handle to the CUSPARSE context */
    cusparseHandle_t cusparseHandle = 0;
    cusparseStatus_t cusparseStatus;
    cusparseStatus = cusparseCreate(&cusparseHandle);

    cusparseMatDescr_t descr = 0;
    cusparseStatus = cusparseCreateMatDescr(&descr);


    gpuErrchk( cudaMalloc((void **)&d_col, nz*sizeof(int)) );
    gpuErrchk( cudaMalloc((void **)&d_row, (n+1)*sizeof(int)) );
    gpuErrchk( cudaMalloc((void **)&d_val, nz*sizeof(float)) );
    gpuErrchk( cudaMalloc((void **)&d_x, (C[N-1] + R[N-1]) *sizeof(float)) );
    gpuErrchk( cudaMalloc((void **)&d_Ax, N *sizeof(float)) );

    gpuErrchk( cudaMemcpy(d_col, J, nz*sizeof(int), cudaMemcpyHostToDevice) );
    gpuErrchk( cudaMemcpy(d_row, I, (N+ 1)*sizeof(int), cudaMemcpyHostToDevice) );
    gpuErrchk( cudaMemcpy(d_val, val, nz*sizeof(float), cudaMemcpyHostToDevice) );
    gpuErrchk( cudaMemcpy(d_x, X, (C[N-1] + R[N-1]) *sizeof(float), cudaMemcpyHostToDevice) );

    alpha = 1.0;
    beta = -1.0;

cusparseScsrmv(cusparseHandle,CUSPARSE_OPERATION_NON_TRANSPOSE, NEl, (C[N-1] + R[N-1]), nz, &alpha, descr, d_val, d_row, d_col, d_x, &beta, d_Ax);

gpuErrchk( cudaMemcpy(X, d_x, (C[N-1] + R[N-1]) *sizeof(float), cudaMemcpyDeviceToHost) );

Is this the right implementation for taking as result the X ?

  1. What about beta?Is it right that have it -1 ? When do I have to put it zero?

  2. It says in docs:

x 	<type> vector of n elements if op ( A ) = A

(it’s d_x in my code)

y 	<type> vector of m elements if op ( A ) = A

(it’s d_Ax in mycode)

but then , what should be the dimensions of d_x and d_Ax ?The one says about column elements (n) and the other about row elements (m) .

When I am going to copy :

gpuErrchk( cudaMemcpy(X, d_x, (C[N-1] + R[N-1]) *sizeof(float), cudaMemcpyDeviceToHost) );

what size should I put?

  1. What about d_val?How should I use it?
csrValA 	<type> array of nnz ( = csrRowPtrA(m) - csrRowPtrA(0) ) nonzero elements of matrix A.

I used the example from samples which uses d_val (crsVal) t like in the code above.

Thank you!

First of all some terminology. When we talk about solving a system we often mean something like:

Ax = b

A is a large sparse matrix (it is given/specified)

b is the “Right hand side” vector (it is given/specified)

x is the solution vector (an unknown)

It’s customary to use lower case for vectors and uppercase for matrices. You can do whatever you want of course, but when you write something like this:

Ax = B

then I have to ask is B a vector?

Moving on, the csrmv cusparse function is a matrix-vector multiply function, nothing more. If I were given A and x, I could compute b using it. But that is usually not what we’re trying to accomplish when we say “solve Ax = b”. If we are given A and b, and want to find x, it’s not a simple matter of just using a matrix-vector multiply. We must use a solution method (e.g. “conjugate gradient” if our matrix A meets certain requirements) which may well use a csrmv type function in the process of finding a solution. If you’re thinking csrmv can do something like “matrix left divide” in Matlab, it cannot.

If you want to see how csrmv can be properly set up to do a matrix-vector multiply, in the overall context of a conjugate gradient solution of Ax=b (where A and b are known, and x is unknown) take a look at the conjugate gradient cuda sample code:

I’d rather not try to sort out your code, because it seems evident to me that you are trying to use csrmv to find x when given A and b in this equation:

Ax = b

and csrmv cannot do that by itself.

Hello and thanks for the help.

I want to compute the solution of Ax=b , hence the x vector , using conjugate gradient (ok for the uppercase letter,I missed it).

So, in order to understand , the cusparseScsrm computes b ? And then all the calls to cublasSaxpy and cublasSscal etc and the while loop is for solving x?

So , I must use the same procedure to solve x?


Yes, the cusparseScsrmv computes A times x (or in an iterative solution, x “hat”, which is the estimate of the solution vector). A condensed iterative solution sequence is typically something like:

  1. take an initial guess at x (might be a vector of all 1’s, for example), call it _x
  2. compute A times _x (using csrmv, for example, if A is a CSR sparse matrix)
  3. the result of step 2, let’s call it _b (i.e. A times _x = _b), is then compared to the actual right-hand-side vector b. The method of comparison varies by the solution method. The result of the comparison may be called a “residual” and we can think of this residual as the “difference” between b, our desired RHS vector, and _b, our calculated RHS vector
  4. based on this residual, and the solution method being used (e.g. conjugate gradient), we compute a new estimate of x, called _x, and we repeat the above process starting with step 2.
  5. we continue the above iterative calculation (steps 2-4), until the residual meets some convergence criterion (e.g. the difference between _b and b is small).
  6. once we have “converged”, then we stop and call our most recent _x the “solution vector x”.

So the csrmv function may only be used in step 2 (it may also be used in other places, depending on the solution method).

Rather than trying to cover this kind of a tutorial here, you can read about stuff like this on places like wikipedia, where you will find much better explanations than what I have given.

especially “The conjugate gradient method as an iterative method”

Also note that conjugate gradient solution method is only typically used when A is SPD (symmetric, positive definite). Again, please read the available literature rather than asking for an explanation of what SPD is here.

Ok , thanks ,I will study and try to make it work.

One question…

Is this method (that is used in nvidia samples) used for dense matrices also?
And what about if I don’t know how many non zero elements it has?

You can use CG as a general method for a dense matrix A. However you wouldn’t use CUSPARSE. You would leave your dense matrix in ordinary dense matrix form, and use corresponding functions out of CUBLAS (e.g dense matrix-vector multiply cublasgemv())

In the sparse matrix case, if you have a dense matrix that you are converting to sparse, then you will be counting the number of non-zero elements it has in the process of conversion to sparse representation form. If you’re talking about the dense matrix case, there is no need to know how many non-zero elements it has.

Let me understand.

If I have a dense matrix and I want to use CG method.

  1. Can I use cusparsennz,which says it counts the nnz elements in a dense matrix and then use the code from nvidia samples ? (regarding the CG)

But if I use this , I don’t know how to handle csrValA , csrRowPtrA ,csrColIndA (

  1. You said if I have a dense matrix and convert it to sparse.How can I do that?Because I only found converting sparse to dense in cusparse documentation.

Can you help me please with a piece of code regarding the csrValA , csrRowPtrA ,csrColIndA ? I don’t understand how to use them if I have a matrix that I have created in my code.

Thank you for your help.

I opened another post

also , where I am using cublas function in order to compute Ax instead of cusparse function.

But if you can answer me to the questions above ,I will be grateful!


To convert a dense matrix to CSR sparse format, you could use cusparse dense2csr:

You could also write a routine yourself to do it.

Ok, thanks a lot!