why cusparse is just 2x faster than mkl

hi,

I am now using cusparse to do some sparse matrix operation , but I just got 2x faster than Intel MKL.

My GPU is GTX 590 (I am only using one GPU now) and my CPU is intel-i7 quad core 2600k. All the matrices

are of size 15k x 15k. The sparsity for matrixOne is 16.38%, the sparsities of all the matrices in the array matrixTwo are the same, which is 2.43%.

Here is the pseudo code of my program:

vector v0,v1,v2;

    cudaStream stream[3];

    sparseMatrix matrixOne, matrixTwo[3];

    for(int i = 0 ;i<5;i++){

          cusparseSetKernelStream(handle,stream[0]);

          v0 = cusparseScsrmv(matrixOne,v0);

          cusparseSetKernelStream(handle,stream[1]);

          v1 = cusparseScsrmv(matrixOne,v1);

          cusparseSetKernelStream(handle,stream[2]);

          v2 = cusparseScsrmv(matrixOne,v2);

cusparseSetKernelStream(handle,stream[0]);

          v0 = cusparseScsrmv(matrixTwo[0],v0);

cusparseSetKernelStream(handle,stream[1]);

          v1 = cusparseScsrmv(matrixTwo[1],v1);

cusparseSetKernelStream(handle,stream[2]);

          v2 = cusparseScsrmv(matrixTwo[2],v2);

    }

Is 2x speed up a reasonable number? It is below my expectation. Could you give me some suggestions about how I can improve the speed (to at least 10x faster)? Thanks a lot.

Are you running single or double precision? If you are running double precision then you’ll get a much lower speed up because the consumer cards do not have full double precision capability.