cusparse<t>csrmv_mp() when the operation is CUSPARSE_OPERATION_TRANSPOSE

apoorva.gupta · January 8, 2018, 4:11pm

Hello

So, I am trying to run the cusparsecsrmv_mp() with the TRANSPOSE operation that is recently introduced with the toolkit version 9 (Only the NON_TRANSPOSE version was available in 8) but the problem is that it is giving me the error CUSPARSE_STATUS_INVALID_VALUE. Now since the arguments have not changed between cusparsecsrmv() and cusparsecsrmv_mp() so I don’t know why am I facing this issue. Moreover the NON_TRANSPOSE operation with the same kernel seems to run just fine.

I am using a 940M card with cc 5.0 with the driver version of 387.34, tookit version of 9.1.85 on ubuntu 16.04 LTS.

Hope to find the answer to this issue soon.

Thanks

Robert_Crovella · January 8, 2018, 4:17pm

Maybe you should provide a complete code that someone else could test.

If your A matrix is not square, I don’t think the arguments to the transpose and non-transpose version would be the same.

apoorva.gupta · January 8, 2018, 5:07pm

Hello

Well I am using this function in a project so it doesn’t make sense to put it here but here is a small test code which I wrote that gives the same problem

#include <cuda_runtime.h>
#include
#include <cusparse.h>
#include <assert.h>
using namespace std;

/*
The A matrix here is

1 0 2 0 3
0 4 0 5 0
0 0 6 0 0
0 7 0 8 0
9 0 10 0 11

the vector x is

1
1
1
1
1

*/

global void d_set_value(float* rowVector_d , float value, int num_elements){
int i = threadIdx.x + blockIdx.x*blockDim.x;

if (i<num_elements)
    rowVector_d[i] = value;

}

int main(int argc,char **argv)
{
cusparseStatus_t cusparseStat = CUSPARSE_STATUS_SUCCESS;
// alloc and init input arrays on host (CPU)
int n = 11;
float *csrval = new float[n];
for(int i=0; i<n; i++) csrval[i] = i+1;

int* csrcol = new int[n];
csrcol[0] = 0;
csrcol[1] = 2;
csrcol[2] = 4;
csrcol[3] = 1;
csrcol[4] = 3 ;  
csrcol[5] = 2;
csrcol[6] = 1;
csrcol[7] = 3;
csrcol[8] = 0;
csrcol[9] = 2;
csrcol[10] = 4;

int* csrrow = new int[6];

csrrow[0] = 0;
csrrow[1] = 3;
csrrow[2] = 5;
csrrow[3] = 6;
csrrow[4] = 8;
csrrow[5] = 11;

float* csrval_d;
int *csrcol_d, *csrrow_d;

cudaMalloc((void**)&csrval_d , n*sizeof(float));
cudaMalloc((void**) &csrrow_d , 6*sizeof(int));
cudaMalloc((void**) &csrcol_d , n*sizeof(int));

cudaMemcpy(csrrow_d , csrrow, 6*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(csrcol_d , csrcol, n*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(csrval_d , csrval, n*sizeof(float), cudaMemcpyHostToDevice);

float* rowvector_d;
cudaMalloc((void**)&rowvector_d , 5*sizeof(float));

dim3 block = dim3(1024,1,1);
int grid_x = (n + block.x - 1)/block.x;
int grid_y = 1;
int grid_z = 1;
dim3 grid = dim3(grid_x, grid_y, grid_z);
d_set_value <<<grid, block>>> (rowvector_d, 1, 5);

cusparseMatDescr_t descrA;
cusparseCreateMatDescr(&descrA);
cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL);
cusparseSetMatIndexBase(descrA, CUSPARSE_INDEX_BASE_ZERO);

cusparseHandle_t cusparseHandle;
cusparseCreate(&cusparseHandle);

float alpha = 1.0;
float beta = 0.0;

float *norm_d;
cudaMalloc((void**)&norm_d , 5*sizeof(float));
cudaMemset(norm_d , 0, 5*sizeof(float));

cusparseStat = cusparseScsrmv_mp(cusparseHandle, CUSPARSE_OPERATION_TRANSPOSE, 
    5, 5, 11, &alpha, descrA, csrval_d , 
    csrrow_d, csrcol_d, rowvector_d, &beta, norm_d);

if (CUSPARSE_STATUS_SUCCESS != cusparseStat)
    std::cout << cusparseStat << std::endl;
else
{
cudaMemcpy(csrval , norm_d, 5*sizeof(float) , cudaMemcpyDeviceToHost);                       
for(int i=0; i<5; i++) cout << i << ": " << csrval[i] << endl;
}

}

I hope this would help.

Thanks

Robert_Crovella · January 8, 2018, 8:24pm

It appears that this cusparse operation does not support the transpose operation type. The documentation is in error and should be updated at the next release.

I don’t believe there are any plans to support the transpose operation type.

It’s recommended that you transpose the matrix separately, first, if you want the transpose operation.

You could also revert to using cusparseScsrmv instead, however that operation is also quite a bit slower when using the transpose operation type.

Topic		Replies	Views
Bugs when trying to perform tranpose of a matrix using cuSPARSE GPU-Accelerated Libraries	2	722	October 12, 2021
CUSPARSE conversion routines not working... cusparseSnnz and cusparseSdense2csr misbehaving... CUDA Programming and Performance	11	4158	February 28, 2011
cusparseScsrmv transpose mode is not working CUDA Programming and Performance	17	1495	July 9, 2018
CUSPARSE_STATUS_INVALID_VALUE when using cusparseSpMM GPU-Accelerated Libraries	3	2291	July 14, 2019
cuSPARSE generic procedure could not be resolved NVFORTRAN-S-0155 nvc, nvc++ and nvfortran cuda	9	811	November 22, 2021
Memory read error when using csrmv with transpose operation Legacy PGI Compilers	8	3550	March 8, 2019
How to move from to cusparseScsrmm to cusparseSpMM? GPU-Accelerated Libraries cusparse	1	1659	March 7, 2022
cuSPARSE Library with OpenACC data Directives: cusparseDnVecGetValues not resolvable Legacy PGI Compilers cuda	6	812	October 11, 2021
CUSPARSE: multiplying two sparse matrices (one of them has rows with complete zeroes) CUDA Programming and Performance	0	2010	June 13, 2012
how does "cusparseCsrmvEx(...)" func work? (cusparse) CUDA Programming and Performance	2	1257	April 26, 2018

cusparse<t>csrmv_mp() when the operation is CUSPARSE_OPERATION_TRANSPOSE

Related topics