cusparse<t>csrmv_mp() when the operation is CUSPARSE_OPERATION_TRANSPOSE

Hello

So, I am trying to run the cusparsecsrmv_mp() with the TRANSPOSE operation that is recently introduced with the toolkit version 9 (Only the NON_TRANSPOSE version was available in 8) but the problem is that it is giving me the error CUSPARSE_STATUS_INVALID_VALUE. Now since the arguments have not changed between cusparsecsrmv() and cusparsecsrmv_mp() so I don’t know why am I facing this issue. Moreover the NON_TRANSPOSE operation with the same kernel seems to run just fine.

I am using a 940M card with cc 5.0 with the driver version of 387.34, tookit version of 9.1.85 on ubuntu 16.04 LTS.

Hope to find the answer to this issue soon.

Thanks

Maybe you should provide a complete code that someone else could test.

If your A matrix is not square, I don’t think the arguments to the transpose and non-transpose version would be the same.

Hello

Well I am using this function in a project so it doesn’t make sense to put it here but here is a small test code which I wrote that gives the same problem

#include <cuda_runtime.h>
#include
#include <cusparse.h>
#include <assert.h>
using namespace std;

/*
The A matrix here is

1 0 2 0 3
0 4 0 5 0
0 0 6 0 0
0 7 0 8 0
9 0 10 0 11

the vector x is

1
1
1
1
1

*/

global void d_set_value(float* rowVector_d , float value, int num_elements){
int i = threadIdx.x + blockIdx.x*blockDim.x;

if (i<num_elements)
    rowVector_d[i] = value;

}

int main(int argc,char **argv)
{
cusparseStatus_t cusparseStat = CUSPARSE_STATUS_SUCCESS;
// alloc and init input arrays on host (CPU)
int n = 11;
float *csrval = new float[n];
for(int i=0; i<n; i++) csrval[i] = i+1;

int* csrcol = new int[n];
csrcol[0] = 0;
csrcol[1] = 2;
csrcol[2] = 4;
csrcol[3] = 1;
csrcol[4] = 3 ;  
csrcol[5] = 2;
csrcol[6] = 1;
csrcol[7] = 3;
csrcol[8] = 0;
csrcol[9] = 2;
csrcol[10] = 4;

int* csrrow = new int[6];

csrrow[0] = 0;
csrrow[1] = 3;
csrrow[2] = 5;
csrrow[3] = 6;
csrrow[4] = 8;
csrrow[5] = 11;

float* csrval_d;
int *csrcol_d, *csrrow_d;

cudaMalloc((void**)&csrval_d , n*sizeof(float));
cudaMalloc((void**) &csrrow_d , 6*sizeof(int));
cudaMalloc((void**) &csrcol_d , n*sizeof(int));

cudaMemcpy(csrrow_d , csrrow, 6*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(csrcol_d , csrcol, n*sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy(csrval_d , csrval, n*sizeof(float), cudaMemcpyHostToDevice);

float* rowvector_d;
cudaMalloc((void**)&rowvector_d , 5*sizeof(float));

dim3 block = dim3(1024,1,1);
int grid_x = (n + block.x - 1)/block.x;
int grid_y = 1;
int grid_z = 1;
dim3 grid = dim3(grid_x, grid_y, grid_z);
d_set_value <<<grid, block>>> (rowvector_d, 1, 5);

cusparseMatDescr_t descrA;
cusparseCreateMatDescr(&descrA);
cusparseSetMatType(descrA, CUSPARSE_MATRIX_TYPE_GENERAL);
cusparseSetMatIndexBase(descrA, CUSPARSE_INDEX_BASE_ZERO);

cusparseHandle_t cusparseHandle;
cusparseCreate(&cusparseHandle);

float alpha = 1.0;
float beta = 0.0;

float *norm_d;
cudaMalloc((void**)&norm_d , 5*sizeof(float));
cudaMemset(norm_d , 0, 5*sizeof(float));

cusparseStat = cusparseScsrmv_mp(cusparseHandle, CUSPARSE_OPERATION_TRANSPOSE, 
    5, 5, 11, &alpha, descrA, csrval_d , 
    csrrow_d, csrcol_d, rowvector_d, &beta, norm_d);

if (CUSPARSE_STATUS_SUCCESS != cusparseStat)
    std::cout << cusparseStat << std::endl;
else
{
cudaMemcpy(csrval , norm_d, 5*sizeof(float) , cudaMemcpyDeviceToHost);                       
for(int i=0; i<5; i++) cout << i << ": " << csrval[i] << endl;
}  

}

I hope this would help.

Thanks

It appears that this cusparse operation does not support the transpose operation type. The documentation is in error and should be updated at the next release.

I don’t believe there are any plans to support the transpose operation type.

It’s recommended that you transpose the matrix separately, first, if you want the transpose operation.

You could also revert to using cusparseScsrmv instead, however that operation is also quite a bit slower when using the transpose operation type.