Using cupsarseSpMM, matrix-multiplication is performed A(compressed) * B.

In document, using row-major is faster than colum-major.

I compress A matirx to CSR and CSC format respectively (row-major order).

And matrix-multiplications are performed A (CSR format) * B and A (CSC format) * B.

I thought the speed difference would be negligible, but the CSR form was 1.5 ~ 2 as fast.

Why is the speed difference greater than i expected?

Thank you for your reply.

I used CUSPARSE_SPMM_ALG_DEFAULT algorithm both SpMM.

And M, N, K values are 2048, 2048, 2048. A matrix’s saprsity is setted 99%

CSC SpMM has the same performance as CSR SpMM A^T * B because you cannot perform the computation in the same way as CSR A * B.

Indeed, you need to use atomic operations to get the final result and this affects the performance.

Does it mean that the A matirx’s csc and csr format must have the same memory layout(**csrValA, cscValA**) to get the same performance?

no. It means that SpMM CSC (CSR^T) and CSR have entirely different algorithms. The reason is that they represent the same matrix in different ways (by-columns, or by-rows)