hi,
I am now using cusparse to do some sparse matrix operation , but I just got 2x faster than Intel MKL.
My GPU is GTX 590 (I am only using one GPU now) and my CPU is intel-i7 quad core 2600k. All the matrices
are of size 15k x 15k. The sparsity for matrixOne is 16.38%, the sparsities of all the matrices in the array matrixTwo are the same, which is 2.43%.
Here is the pseudo code of my program:
vector v0,v1,v2;
cudaStream stream[3];
sparseMatrix matrixOne, matrixTwo[3];
for(int i = 0 ;i<5;i++){
cusparseSetKernelStream(handle,stream[0]);
v0 = cusparseScsrmv(matrixOne,v0);
cusparseSetKernelStream(handle,stream[1]);
v1 = cusparseScsrmv(matrixOne,v1);
cusparseSetKernelStream(handle,stream[2]);
v2 = cusparseScsrmv(matrixOne,v2);
cusparseSetKernelStream(handle,stream[0]);
v0 = cusparseScsrmv(matrixTwo[0],v0);
cusparseSetKernelStream(handle,stream[1]);
v1 = cusparseScsrmv(matrixTwo[1],v1);
cusparseSetKernelStream(handle,stream[2]);
v2 = cusparseScsrmv(matrixTwo[2],v2);
}
Is 2x speed up a reasonable number? It is below my expectation. Could you give me some suggestions about how I can improve the speed (to at least 10x faster)? Thanks a lot.