CUSPARSE implementation of SpMV

Hi all,

I am using CUSPARSE to implement the Preconditioned Conjugate Gradient. In the solver, the SpMV product is used many times.
I am developing an optimization of the solver for which it would be important for me to know if CUSPARSE implements the SpMV product in its scalar version or in the vector one, or if it is any other variant (https://www.nvidia.com/docs/IO/77944/sc09-spmv-throughput.pdf).

Would it be possible to obtain this information?

Thank you,
Sergi

cuSPARSE does not implement the algorithm proposed in the paper that you point out. cusparseSpMV follows a nonzero-splitting approach. See “Merge-Based Parallel Sparse Matrix-Vector Multiplication

cusparsecsrmv() in the CUDA Toolkit version 10.2.89 also uses this Merge-Based approach?

The paper you cited states version 7.5 of cuSPARSE uses vectorization. I understand at some point the implementation changed to the nonzero-splitting.

Also, the nonzero splitting is per-thread, per-warp or per-SM?

Thank you,
Sergi

EDIT: Added question and changed misunderstood text.

cusparsecsrmv() was a deprecated API as it has been replaced by cusparseSpMV(). cusparsecsrmv() used a subwarp to row mapping. While for nonzero splitting, all approaches at the state of the art use a per-thread strategy

1 Like