I am applying cusparse function to my application recently to accelerate the SpGEMM. When I went through the documentation, I noted that there are two functions, csrgemm() and csrgemm2() to accomplish this task.
However, I am not quite understand any difference, especially in terms of performance, between this two functions. Did anyone do experiments and can explain it to me? I am quite grateful to your generous help.
“We provide csrgemm2 as a generalization of csrgemm. It provides more operations in terms of alpha and beta. For example, C = -A*B+D can be done by csrgemm2.”
So csrgemm2 does operations in a single library call that csrgemm cannot. csrgemm2 provides support for alpha and beta so that you can do this:
C = alpha ∗ A ∗ B + beta ∗ D
whereas csrgemm can only do this:
C = op ( A ) ∗ op ( B )
If you have an operation that can be realized using csrgemm, it’s unlikely to be faster using csrgemm2. It’s likely that csrgemm2 was added to the api after csrgemm, and so csrgemm was kept for compatibility reasons. It may also be the case that csrgemm is slightly faster if you don’t need an alpha and a beta.