Performance usinng SELL format with cusparseSpMV

Hi, I’ve recently use SELL format to do cusparseSpMV. However, I found the performance is worse than using CSR format. But SELL allows much more memory coalesce, so it should lead to a better performance.

The sparse matrix I used to test is 400,000 by 400,000 from a FEM problem. And they were allocated on device via cudaMalloc and cudaMemcpy etc. And I didn’t pad out the y vector(Ax = y) since it seems that cusparseSpMV doesn’t require it.

Do you have any hints on this because the documentation says little about SELL format usage. Maybe, for instance, the sliceSize would make a difference? Thannks.

The SELL format provides better performance than CSR if you have a uniform sparsity pattern, i.e. a similar number of non-zero elements per row.
If you are able to provide the input matrix, we can help you figure out if the problem is on our side.

Hi @fbusato , thanks for your reply. Actually, the time I measured is Ax=b solved by a preconditioned Conjugate Gradient. But the only difference in two solves are the format of A that is either CSR or SELL. Please note that I didn’t re-order A for building a sorted SELL which bundles rows with similar number of non-zeros together, because that needs to permute b as well, which I think in total may cost more time. Also, is it possible that I put x into texture memory before I pass it to cusparseSpMV? Or it is handles somehow inside of cusparseSpMV already?
Attached please find the files for matrix A, rhs b and also a solution for x. I think, as you sad, you could check t A which is used many times of SpMV during CG iterations.

Btw, I just noticed a cusparseSpMV_preprocess added in cusparse 12.4. Does this routine is required for all formats before calling cusparseSpMV. Or is it just optional? I mean, even for CSR, I could chose to skip it or not?

Thanks in advance.

Attached: matrix files

Hi @417luke318, let me reply the main questions. We will check the matrix soon.

is it possible that I put x into texture memory before I pass it to cusparseSpMV? Or it is handles somehow inside of cusparseSpMV already?

it makes little sense to use texture memory on current architectures. Also, we need to handle texture memory in a special way, so we don’t support it. In summary, A, b, x must be in the standard device memory.

I just noticed a cusparseSpMV_preprocess added in cusparse 12.4. Does this routine is required for all formats before calling cusparseSpMV . Or is it just optional? I mean, even for CSR, I could chose to skip it or not?

yes, the preprocessing step is optional (any format)

1 Like

Hi @417luke318,
As @fbusato mentioned, the SELL format provides better performance when you have uniform sparsity patterns. The matrix provided has a dispersed number of non-zero elements per row (with a mean of 77.9, SD of 65.8, and max row len of 11,723). Thus, CSR is a better choice for this matrix.

Thanks

1 Like

@fbusato Thanks a lot for the clarification!

Thank you @malmasri! Now I got the reason behind it :).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.