I am using CUSPARSE to implement the Preconditioned Conjugate Gradient. In the solver, the SpMV product is used many times.
I am developing an optimization of the solver for which it would be important for me to know if CUSPARSE implements the SpMV product in its scalar version or in the vector one, or if it is any other variant (https://www.nvidia.com/docs/IO/77944/sc09-spmv-throughput.pdf).
cusparsecsrmv() was a deprecated API as it has been replaced by cusparseSpMV(). cusparsecsrmv() used a subwarp to row mapping. While for nonzero splitting, all approaches at the state of the art use a per-thread strategy