Hi,
I’ve put together a little demo of my problem. See the attached file. Here is the output of my program:
Initializing CUSPARSE…done
This tests shows that the CUSPARSE format conversion
functions are not working as expected. We have a matrix in device memory that we want to
convert to CSR, but things don’t work correctly. The example below is taking from
page 10 of the CUSPARSE Library PDF. This was tested on CUDA 3.2 Nov 2010.
Yes I know that the matrix is already sparse, but the use case is that we already have a sparse matrix in device
memory that we want to convert to CSR format.
h_A =
1 4 0 0 0
0 2 3 0 0
5 0 0 7 8
0 0 9 0 6
Calling cusparseSnnz with lda = 4. We are using CUSPARSE_DIRECTION_ROW (e.g nnzPerVector stores nnz per row)
nnz= 9 - CORRECT!
h_nnzPerVector - WRONG
1 3 3 2
Should be: 2 2 3 2
Calling cusparseSdense2csr
h_csrValA - WRONG
1 4 7 9 2 5 8 3 6
Should be: 1 4 2 3 5 7 8 9 6
h_csrRowPtrA - WRONG, though first and last enteries are correct
0 1 4 7 9
Should be: 0 2 4 7 9
h_csrColIndA - WRONG
0 0 3 4 1 2 3 1 4
Should be: 0 1 1 2 0 3 4 2 4
So as you can see the results are just wrong. If we instead do things by column (forgot the exact setup)
then we will get the correct results for the above variables. But the problem then is that if you later
want to use your CSR in say a call to cusparseScsrmv then you would have to specify for transA that the matrix
is a transpose. This brings the multiplication down to a crawl, and a regular CUBLAS dense multiply is 15x faster! Go Figure!
So I think this may be a bug. Any help greatly appreciated
cusparse-conversion-test-krunal.zip (9.83 KB)