CUSPARSE conversion routines not working... cusparseSnnz and cusparseSdense2csr misbehaving...

lateralpunk · December 1, 2010, 7:37pm

Hi,
I’ve put together a little demo of my problem. See the attached file. Here is the output of my program:

Initializing CUSPARSE…done
This tests shows that the CUSPARSE format conversion
functions are not working as expected. We have a matrix in device memory that we want to
convert to CSR, but things don’t work correctly. The example below is taking from
page 10 of the CUSPARSE Library PDF. This was tested on CUDA 3.2 Nov 2010.
Yes I know that the matrix is already sparse, but the use case is that we already have a sparse matrix in device
memory that we want to convert to CSR format.
h_A =
1 4 0 0 0
0 2 3 0 0
5 0 0 7 8
0 0 9 0 6

Calling cusparseSnnz with lda = 4. We are using CUSPARSE_DIRECTION_ROW (e.g nnzPerVector stores nnz per row)
nnz= 9 - CORRECT!

h_nnzPerVector - WRONG
1 3 3 2
Should be: 2 2 3 2

Calling cusparseSdense2csr

h_csrValA - WRONG
1 4 7 9 2 5 8 3 6
Should be: 1 4 2 3 5 7 8 9 6

h_csrRowPtrA - WRONG, though first and last enteries are correct
0 1 4 7 9
Should be: 0 2 4 7 9

h_csrColIndA - WRONG
0 0 3 4 1 2 3 1 4
Should be: 0 1 1 2 0 3 4 2 4

So as you can see the results are just wrong. If we instead do things by column (forgot the exact setup)
then we will get the correct results for the above variables. But the problem then is that if you later
want to use your CSR in say a call to cusparseScsrmv then you would have to specify for transA that the matrix
is a transpose. This brings the multiplication down to a crawl, and a regular CUBLAS dense multiply is 15x faster! Go Figure!

So I think this may be a bug. Any help greatly appreciated
cusparse-conversion-test-krunal.zip (9.83 KB)

Dittoaway · December 1, 2010, 10:51pm

You’ve allocated nnzPerVector size m instead of mxn

Cliff_Woolley · December 2, 2010, 2:06am

Hi there,

One of the CUSPARSE engineers took a look at this for us, and here’s what he says:

Hope this helps.

Thanks,

Cliff

Dittoaway · December 2, 2010, 2:56pm

This is a failure of the CUSPARSE documentation.

In the CUBLAS documentation, we find the following:

“For maximum compatibility with existing Fortran environments,
CUBLAS uses columnâ€major storage and 1â€based indexing. Since C
and C++ use rowâ€major storage, applications cannot use the native
array semantics for twoâ€dimensional arrays. Instead, macros or inline
functions should be defined to implement matrices on top of onedimensional
arrays.”

Such a statement should be at the beginning of the CUSPARSE documentation.

Further confusion is introduced by footnotes throughout the CUSPARSE documentation saying that various arrays are in row-major format.
For instance, on page 8;

“Note: It is assumed that the indices are given in row-major format …”

Cliff_Woolley · December 2, 2010, 6:33pm

Thanks for the feedback. I’ll pass this along to the CUSPARSE team.

lateralpunk · December 3, 2010, 3:26am

Yes I guess I was under the impression that CUSPARSE worked in row-major. Well anyways, good to know that its not a bug! Ok, so let me try to get it to work now that I know things are done per column.

One quesiton,
what would be the best conversion route to make a call to cusparseScsrmv the most efficient e.g. I did get things working with the above code by fiddling with the variables but then had to do TRANSPOSE for cusparseScsrmv which slowed down my app to a crawl. My question is I want to use NON_TRANSPOSE (which BTW the doc says is the only one that is supported, though I’ve got it to work with just TRANSPOSE). Using NON_TRANSPOSE should be faster right?

lateralpunk · December 3, 2010, 3:36am

I guess I should clarify my above quesiton. What would I need to do if my matrix is stored in row-major format in device memory and I want to convert it to CSR? assume I have a mXn, could you tell me the paramaters for the conversion calls and for cusparseScsrmv. I’m guessing for cusparseScsrmv I would then have to use TRANSPOSE, right? And if I do, then things will be slow (I’ve already proved this in my local test 20x for CUBLAS mv versus 5x for CUSPARSE mv)…

Would it be best to convert my device matrix into col - major to get the benefit of cusparseScsrmv?? Is there a CUBLAS method for matrix transpose. I know there is an SDK example…

thanks!

Cliff_Woolley · December 4, 2010, 1:15am

Try the following:

Assuming that the matrix is not a sub-matrix of another larger matrix (i.e., assuming rows are stored contiguously), then:

Think of dense mxn matrix A stored in memory in row-major format (lda=m), as a dense nxm matrix At stored in memory in column-major format (lda=n).

A =

[ row 1 ]

[ row 2 ]

[ row 3 ]

Convert it to CSC format (using nnz with CUSPARSE_DIRECTION_COLUMN and dense2csc).

So far we have simply created a matrix in CSC format corresponding to At.

Just reinterpret the CSC storage as CSR storage (cscColPtr => csrRowPtr, cscRowInd=> csrColInd, cscVal =>csrVal) when calling csrmv, which will implicitly be transposing the matrix, and you should have the result of the multiplication by the original A.

Notice that by reinterpreting CSC as CSR format we are implicitly transposing the matrix back.

Hope this helps,

Cliff

lateralpunk · December 4, 2010, 7:25am

Hi Cliff,
thanks for your detailed response. I will try this, but it’s going to be hard for me to finish off my paper (Text mining on GPU) and try to recode things. I guess I want to ask one final thing. You talk of “implicit transpose”. All I want to know is will I see a huge hit on performance when I call csrmv? Because as I mentioned, I did get things working (not by your mechnamis of using CSC) but I was hit big time when using csrmv and TRANSPOSE.

lateralpunk · December 5, 2010, 2:25am

Can’t say I got it to work. Before I post the code for what I think Cliff meant, here is some code that corresponds to using TRANSPOSE on the call to csrmv. Doing it this way makes things work though makes my app more than 15x slower.

int nnz = 0;

  int *d_nnzPerVector; 

int m = cd->numDocuments; //the rows

  int n = cd->numTerms;  //the columns

  int count_nnzPerVector = n;

  cutilSafeCall( cudaMalloc((void**)&d_nnzPerVector, count_nnzPerVector*sizeof(*d_nnzPerVector) ) );

  cudaMemset(d_nnzPerVector, -1, count_nnzPerVector*sizeof(*d_nnzPerVector));

  if (CUSPARSE_STATUS_SUCCESS != cusparseSnnz(g_cusparse_handle,CUSPARSE_DIRECTION_ROW,

      n,m,cd->cusparseAMatDesc,cd->A,n,d_nnzPerVector,&nnz))

  {

    printf("Error: Couldn't initialize conversion of dense to sparse matrix.\n");

    exit(-1);

  }

cutilSafeCall( cudaMalloc((void**)&cd->d_csrValA, nnz*sizeof(*cd->d_csrValA) ) );

  cutilSafeCall( cudaMalloc((void**)&cd->d_csrColIndA, nnz*sizeof(*cd->d_csrColIndA) ) );

  cutilSafeCall( cudaMalloc((void**)&cd->d_csrRowPtrA, (n+1)*sizeof(*cd->d_csrRowPtrA) ) );

if (CUSPARSE_STATUS_SUCCESS != cusparseSdense2csr(g_cusparse_handle,n,m,cd->cusparseAMatDesc,cd->A,

       n,d_nnzPerVector,cd->d_csrValA,cd->d_csrRowPtrA,cd->d_csrColIndA))

  {

    printf("Error: Couldn't convert dense to sparse matrix.\n");

    exit(-1);

  }

.......

///////////////////////////////////////////////////

//The call to csrmv

  int m = cd->numDocuments;

  int n = cd->numTerms;

  if (CUSPARSE_STATUS_SUCCESS != cusparseScsrmv(g_cusparse_handle,CUSPARSE_OPERATION_TRANSPOSE,

  n,m,1.0f,cd->cusparseAMatDesc,cd->d_csrValA,cd->d_csrRowPtrA,cd->d_csrColIndA,

  cd->d_query,1.0f,cd->d_dochits))

{

    printf("Error: couldn't perform matrix-vector multiply.\n");

    return false;

  }

.......

which results in the following:

0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.5000  0.5000  0.0000  0.5000  0.5000  0.0000  

0.0000  0.7789  0.0000  0.0000  0.0000  0.4435  0.0000  0.0000  0.4435  0.0000  0.0000  0.0000  

0.0000  0.0000  0.0000  0.7120  0.4054  0.4054  0.0000  0.0000  0.0000  0.0000  0.0000  0.4054  

0.2641  0.0000  0.9277  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.2641  

0.5774  0.0000  0.0000  0.0000  0.5774  0.0000  0.0000  0.0000  0.5774  0.0000  0.0000  0.0000  

h_nnzPerVector

2 1 1 1 2 2 1 1 2 1 1 2 

nnz= 17

h_csrValA

0.2641 0.5774 0.7789 0.9277 0.7120 0.4054 0.5774 0.4435 0.4054 0.5000 0.5000 0.4435 0.5774 0.5000 0.5000 0.4054 0.2641 

h_csrRowPtrA

0 2 3 4 5 7 9 10 11 13 14 15 17 

h_csrColIndA

3 4 1 3 2 2 4 1 2 0 0 1 4 0 0 2 3

Now I"m going to (re)try what Cliff mentioned, but I don’t see how it will work. If we call csrmv passing it csc type paramaters the # of cells in the row vector are different between the both.

lateralpunk · December 8, 2010, 6:41pm

I’ve come to the conclusion that if one is going to use CUBLAS or CUSPARSE, one should store their matrices in column-major format as required by the docs. This avoids any expensive transpose operation. At present I’m getting 20x speedup using CUBLAS, row-major + transpose, sgemv. I’m going to covert this into CUSPARSE, column-major, no transpose, sgemv, and will put the results here once I’m done.

g_hinojos · February 28, 2011, 7:54pm

could you describe your d_nnzRowVector please! I quite not understand the values of this vector. THANKS

Topic		Replies	Views
Problem in basic dense to csr format conversion using CUSPARSE GPU-Accelerated Libraries	3	944	July 28, 2015
cusparseScsrmv transpose mode is not working CUDA Programming and Performance	17	1568	July 9, 2018
Problem with cusparseXcoo2csr and cuda-gdb CUDA Programming and Performance	9	1469	January 12, 2012
Function to convert a Sparse matrix to CSR format GPU-Accelerated Libraries	2	1203	August 10, 2018
cannot get proper results for cusparseSnnz GPU-Accelerated Libraries	1	1043	January 6, 2016
Sparse Matrix-Vector Multiplication on CUDA CUDA Programming and Performance	79	313693	November 22, 2010
Memory read error when using csrmv with transpose operation Legacy PGI Compilers	8	3568	March 8, 2019
cuSPARSE generic procedure could not be resolved NVFORTRAN-S-0155 nvc, nvc++ and nvfortran cuda	9	832	November 22, 2021
cusparse<t>dense2csr for 2D array CUDA Programming and Performance	8	1688	December 1, 2016
Problem of two large sparse matrices multiplication in cuParse CUDA Programming and Performance	6	3727	November 21, 2016

CUSPARSE conversion routines not working... cusparseSnnz and cusparseSdense2csr misbehaving...

Related topics